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Preface 


Important though the general concepts and propositions may be with which 
the modern and industrious passion for axiomatizing and generalizing has 
presented us, in algebra perhaps more than anywhere else, nevertheless | am 
convinced that the special problems in all their complexity constitute the 
stock and core of mathematics, and that to master their difficulties requires 
on the whole the harder labor. 


Herman Weyl 


This book began about 20 years ago in the form of supplementary notes for my alge- 
bra classes. I wanted to discuss some concrete topics such as symmetry, linear 
groups, and quadratic number fields in more detail than the text provided, and to 
shift the emphasis in group theory from permutation groups to matrix groups. Lat- 
tices, another recurring theme, appeared spontaneously. My hope was that the con- 
crete material would interest the students and that it would make the abstractions 
more understandable, in short, that they could get farther by learning both at the 
same time. This worked pretty well. It took me quite a while to decide what I 
wanted to put in, but I gradually handed out more notes and eventually began teach- 
ing from them without another text. This method produced a book which is, I think, 
somewhat different from existing ones. However, the problems I encountered while 
fitting the parts together caused me many headaches, so I can’t recommend starting 
this way. 

The main novel feature of the book is its increased emphasis on special topics. 
They tended to expand each time the sections were rewritten, because I noticed over 
the years that, with concrete mathematics in contrast to abstract concepts, students 
often prefer more to less. As a result, the Ones mentioned above have become major 
parts of the book. There are also several unusual short subjects, such as the Todd— 
Coxeter algorithm and the simplicity of PSL2. 
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xiv Preface 


In writing the book, I tried to follow these principles: 


1. The main examples should precede the abstract definitions. 
2. The book is not intended for a “service course,” so technical points should be 
presented only if they are needed in the book. 


3. All topics discussed should be important for the average mathematician. 


Though these principles may sound like motherhood and the flag, I found it useful to 
have them enunciated, and to keep in mind that “Do it the way you were taught” 
isn’t one of them. They are, of course, violated here and there. 

The table of contents gives a good idea of the subject matter, except that a first 
glance may lead you to believe that the book contains all of the standard material in 
an undergraduate algebra course, and more. Looking more closely, you will find that 
things have been pared down here and there to make space for the special topics. I 
used the above principles as a guide. Thus having the main examples in hand before 
proceeding to the abstract material allowed some abstractions to be treated more 
concisely. I was also able to shorten a few discussions by deferring them until the 
students have already overcome their inherent conceptual difficulties. The discussion 
of Peano’s axioms in Chapter 10, for example, has been cut to two pages. Though 
the treatment given there is very incomplete, my experience is that it suffices to give 
the students the flavor of the axiomatic development of integer arithmetic. A more 
extensive discussion would be required if it were placed earlier in the book, and the 
time required for this wouldn’t be well spent. Sometimes the exercise of deferring 
material showed that it could be deferred forever—that it was not essential. This 
happened with dual spaces and multilinear algebra, for example, which wound up on 
the floor as a consequence of the second principle. With a few concepts, such as the 
minimal polynomial, I ended up believing that their main purpose in introductory al- 
gebra books has been to provide a convenient source of exercises. 

The chapters are organized following the order in which I usually teach a 
course, with linear algebra, group theory, and geometry making up the first 
semester. Rings are first introduced in Chapter 10, though that chapter is logically 
independent of many earlier ones. I use this unusual arrangement because I want to 
emphasize the connections of algebra with geometry at the start, and because, over- 
all, the material in the first chapters is the most important for people in other fields. 
The drawback is that arithmetic is given short shrift. This is made up for in the later 
chapters, which have a strong arithmetic slant. Geometry is brought back from time 
to time in these later chapters, in the guise of lattices, symmetry, and algebraic ge- 
ometry. 


Michael Artin 
December 1990 


A Note for the Teacher 


There are few prerequisites for this book. Students should be familiar with calculus, 
the basic properties of the complex numbers, and mathematical induction. Some ac- 
quaintance with proofs is obviously useful, though less essential. The concepts from 
topology, which are used in Chapter 8, should not be regarded as prerequisites. An 
appendix is provided as a reference for some of these concepts; it is too brief to be 
suitable as a text. 

Don’t try to cover the book in a one-year course unless your students have al- 
ready had a semester of algebra, linear algebra for instance, and are mathematically 
fairly mature. About a third of the material can be omitted without sacrificing much 
of the book’s flavor, and more can be left out if necessary. The following sections, 
for example, would make a coherent course: 


Chapter 1, Chapter 2, Chapter 3: 1-4, Chapter 4, Chapter 5: 1-7, 
Chapter 6: 1,2, Chapter 7: 1-6, Chapter 8: 1-3,5, Chapter 10: 1-7, 
Chapter 11: 1-8, Chapter 12: 1-7, Chapter 13: 1-6. 


This selection includes some of the interesting special topics: symmetry of plane 
figures, the geometry of SU2, and the arithmetic of imaginary quadratic numbei 
fields. If you don’t want to discuss such topics, then this is not the book for you. 

It would be easy to spend an entire semester on the first four chapters, but this 
would defeat the purpose of the book. Since the real fun starts with Chapter 5, it is 
important to move along. If you plan to follow the chapters in order, try to get to 
that chapter as soon as is practicable, so that it can be done at a leisurely pace. It will 
help to keep attention focussed on the concrete examples. This is especially impor- 
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xvi A Note for the Teacher 


tant in the beginning for the students who come to the course without a clear idea of 
what constitutes a proof. 

Chapter 1, matrix operations, isn’t as exciting as some of the later ones, so it 
should be covered fairly quickly. | begin with it because | want to emphasize the 
general linear group at the start, instead of following the more customary practice of 
basing examples on the symmetric group. The reason for this decision is Principle 3 
of the preface: The general linear group is more > important. 

Here are some suggestons for Chapter 2: Ds 


1. Treat the abstract material with a light touch. You can have another go at it in 
Chapters 5 and 6. 
2. For examples, concentrate on matrix groups. Mention permutation groups only in 


passing. Because of their inherent notational difficulties, examples trom symme- 
try such as the dihedral groups are best deferred to Chapter 5. 


3. Don’t spend too much time on arithmetic. Its natural place in this book is Chap- 
ters 10 and 11. 


4. Deemphasize the quotient group construction. 


Quotient groups present a pedagogical problem. While their construction 1s concep- 
tually difficult, the quotient is readily presented as the image of a homomorphism in 
most elementary examples, and so it does not require an abstract definition. Modular 
arithmetic is about the only convincing example for which this ts not the case. And 
since the integers modulo n form a ring, modular arithmetic isn’t the ideal motivat- 
ing example for quotients of groups. The first serious use of quotient groups comes 
when generators and relations are discussed in Chapter 6, and I deferred the treat- 
ment of quotients to that point in early drafts of the book. But fearing the outrage of 
the algebra community I ended up moving it to Chapter 2. Anyhow, if you don't 
plan to discuss hia and relations for groups in your course, then you can defer 
cael roll, and where modular arithmetic becomes a a prime motivating example. 

In Chapter 3, vector spaces, I’ve tried to set up the computations with bases in 
such a way that the students won’t have trouble keeping the indices straight. I've 
probably failed, but since the notation is used throughout the book, it may be advis- 
able to adopt it. 

The applications of linear operators to rotations and linear differential equa- 
tions in Chapter 4 should be discussed because they are used later on, but the temp- 
tation to give differential equations their due has to be resisted. This heresy will be 
forgiven because you are teaching an algebra course. 

There is a gradual rise in the level of sophistication which is assumed of the 
reader throughout the first chapters, and a jump which I’ve been unable to eliminate 
occurs in Chapter 5. Had it not been for this jump, I would have moved symmetry 
closer to the beginning of the book. Keep in mind that symmetry is a difficult con- 
cept. It is easy to get carried away by the material and to leave the students behind. 
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Except for its first two sections, Chapter 6 contains optional material. The last 
section on the Todd—Coxeter algorithm isn’t standard; it is included to justify the 
discussion of generators and relations, which are pretty useless without it. 

There is nothing unusual in the chapter on bilinear forms, Chapter 7. | haven't 
overcome the main problem with this material, that there are too many variations on 
the same theme, but have tried to keep the discussion short by concentrating on the 
real and complex cases. 

In the chapter on linear groups, Chapter 8. plan to spend time on the geometry 
of SU». My students complained every year about this chapter until | expanded thie 
“sections on SU>. after which they began asking for supplementary reading, wanting 
to learn more. Many of our students are not familiar with the concepts from topol- 
ogy when they take the course, and so these concepts require a light touch. But I’ve 
found that the problems caused by the students’ lack of familiarity can be managed. 
Indeed, this is a good place for them to get an idea of what a manifold is. Unfortu- 
nately, | don’t know a really satisfactory reference for further reading. _ 

Chapter 9 on group representations is optional. I resisted including this topic 
for a number of years, on the grounds that it is too hard. But students often request 
it, and I kept asking myself: If the chemists can teach it, why can’t we? Eventually 
the internal logic of the book won out and group representations went in. As a divi- 
dend, hermitian forms got an application. 

The unusual topic in Chapter |] is the arithmetic of quadratic number fields. 
You may find the discussion too long for a general algebra course. With this possibil- 
ity in mind, I’ve arranged the material so that the end of Section 8, ideal factoriza- 
tion, is a natural stopping point. 

It seems to me that one should at least mention the most important examples of 
fields in a beginning algebra course, so I put a discussion of function fields into 
Chapter 13. 

There is always the question of whether or not Galois theory should be pre- 
sented in an undergraduate course. It doesn’t have quite the universal applicability 
of most of the subjects in the book. But since Galois theory is a natural culmination 
of the discussion of symmetry, it belongs here as an optional topic. I usually spend at 
least some time on Chapter 14. 

I considered grading the exercises for difficulty, but found that I couldn’t do it 
consistently. So I’ve only gone so far as to mark some of the harder ones with an 
asterisk. I believe that there are enough challenging problems, but of course one al- 
ways needs more of the interesting, easier ones. 

Though I've taught algebra for many years, several aspects of this book are ex- 
perimental, and [ would be very grateful for critical comments and suggestions from 
the people who use it. 


“One, two, three, five, four...” 

“No Daddy, it’s one, two, three, four, five.” 

“Well if I want to say one, two, three, five, four, 
why can’t 1?” 

“That's not how it goes.” 
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Matrix Operations 


(Eritlich wird alles dagjenige cine Grogpe qenenne. 
welches einer Vermebrung oder ciner Verminderung fabig it 
oder wos fich nocd) etwas binguiesen oder DavON Wegnebmen [ABr. 


Leonhard Euler 


Matrices play a central role in this book. They form an important part of the theory, 
and many concrete examples are based on them. Therefore it is essential to develop 
facility in matrix manipulation. Since matrices pervade much of mathematics, the 
techniques needed here are sure to be useful elsewhere. 

The concepts which require practice to handle are matrix multiplication and 
determinants. 


I. THE BASIC OPERATIONS 


Let m,n be positive integers. An m Xn matrix is a collection of mn numbers ar- 


ranged in a rectangular array: 
n columns 


a\\ 53 ioe Gin 
(ele) mt TOWS 


Aimn| seats Amn 


% | is a 2 X 3 matrix 
1 Sag ‘ . 

The numbers in a matrix are called the matrix entries and are denoted by a,j, 
where i, j are indices (integers) with | =1 =m and ese me The index: 4 1s 
called the row index, and j is the column index. So aj is the entry which appears in 


For example. 


1 
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the ith row and jth column of the matrix: 


i 
7 re aij fee 
In the example above, a), = 2, ai3 = 0, and ax3 = 5. 
We usually introduce a symbol such as A to denote a matrix, or we may write it 


as (aij). 
A | Xn matrix is called an n-dimensional row vector. We will drop the index i 
when m = | and write a row vector as 


(1.2) A=f[a---a,], oras A = (a,...,@n). 


The commas in this row vector are optional. Similarly, an m X | matrix is an m- 


dimensional column vector: 
b, 


(ite3) B= 


A 1 X 1 matrix [a] contains a single number, and we do not distinguish such a ma- 
trix from its entry. 


(1.4) Addition of matrices is vector addition: 


WW ue eee (ay) + (by) = (sy), 
Sor? i Ree where sj = aj + bj for all i, j7. Thus 
nh . 
eee awere |; | ‘| re i, 0 if |; | | 
AS ee 1 3 5} [4-3 1] Ls o 6] 
yy “e , . The sum of two matrices A,B is defined only when they are both of the same 
a in) oil shape, that is, when they are m X n matrices with the same m and n. 
> Xe / 
= \pr. (1.5) Scalar multiplication of a matrix by a number is defined as with vectors. ‘The 
63 


result of multiplying a number c and a matrix (aj) is another matrix: 
c (aij) = (bij), 
where bj = caj for all i, 7. Thus 
Orel OF 
232 3/= 14 6 
ol 4 2 


Numbers will also be referred to as scalars. 
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The complicated notion is that of matrix multiplication. The first case to learn 
is the product AB of a row vector A (1.2) and a column vector B (1.3) which is 
defined when both are the same size, that is, m = n. Then the product AB is the 
| X | matrix or scalar 


(1.6) a,b, oF arb» i 080 Ge Cheol Otic 
(This product is often called the “dot product” of the two vectors.) Thus 


oneal 3 - filet). dea 
4 


The usefulness of this definition becomes apparent when we regard A and B as vec- 
tors which represent indexed quantities. For example, consider a candy bar contain- 
ing m ingredients. Let a, denote the number of grams of (ingredient), per candy bar, 
and let 5, denote the cost of (ingredient); per gram. Then the matrix product AB = ¢ 
computes the cost per candy bar: 


(grams/bar) » (cost/gram) = (cost/bar). 


On the other hand, the fact that we consider this to be the product of a row by a 
column is an arbitrary choice. 

In general, the product of two matrices A and B is defined if the number of 
columns of A is equal to the number of rows of B, say if A is an € X m matrix and B 
is an mXn matrix. In this case, the product is an € Xn matrix. Symbolically, 
(€ x m) - (m Xn) = (€ Xn). The entries of the product matrix are computed by 
multiplying all rows of A by all columns of B, using rule (1.6) above. Thus if we de- 
note the product AB by P, then 


(1.7) Pij = aisby + aizbrjy + +++ + Aimbm;. 


This is the product of the ith row of A and the jth column of B. 
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For example, 


(1.8) ! - a 4|= 7) 


3 


This definition of matrix multiplication has turned out to provide a very convenient 
computational tool. 

Going back to our candy bar example, suppose that there are € candy bars. 
Then we may form a matrix A whose ith row measures the ingredients of (bar);. If 
the cost is to be computed each year for n years, we may form a matrix B whose jth 
column measures the cost of the ingredients in (year);. The matrix product AB = P 

computes the cost per bar: pi = cost of (bar); in (year);. 
‘ Matrix notation was introduced in the nineteenth century to provide a short- 
hand way of writing linear equations. The system of equations 


QAyiX ap Dante QinXn = b, 
Q21X\ ae 89D oP Q2nXn = b2 


ll 


QAmiX\ + ee: Se AmnXn De 
can be written in matrix notation as 
(1.9) AX = 6, 


where A denotes the coefficient matrix (aj), X and B are column vectors, and AX is 
the matrix product 


Xn 


[ ie : vale: 
3 4-6{| ° 1 
X3 
represents the following system of two equations in three unknowns: 
2.6) + 2x3 2 


3x, + 4m — 6x; I. 


Equation (1.8) exhibits one solution: x; = 1, x. = 4, x3 = 3. 
Formula (1.7) defining the product can also be written in “sigma” notation as 


Thus the matrix equation 


m 
pi = 2 Aiba = ». Aixby. 
my k 
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Each of these expressions is a shorthand notation for the sum (1.7) which defines the 
product matrix. 

Our two most important notations for handling sets of numbers are the > or 
sum notation as used above and matrix notation. The > notation is actually the more 
versatile of the two, but because matrices are much more compact we will use them 
whenever possible. One of our tasks in later chapters will be to translate complicated 
mathematical structures into matrix notation in order to be able to work with them 
conveniently. 

Various identities are satisfied by the matrix operations, such as the distributive 


laws ee ae 


(1.10) A(B + B') = AB + AB’, and (A+ A')B = AB + A’B 


and the associative law 


(1.11) a (AB)C = A(BC). 


These laws hold whenever the matrices involved have suitable sizes, so that the 
products are defined. For the associative law, for example, the sizes should be 
A = €Xm, B = mXn and, C = nXp, for some €, m,n, p. Since the two products 
(1.11) are equal, the parentheses are not required, and we will denote them by ABC. 
The triple product ABC is then an € X p matrix. For example, the two ways of com- 
puting the product 


20 
asc = {>It OP timed 
01 

are 


Zo 
reo 1 Se ee mel mai. | 
casye = |} 0 | eal =" | and atac) = [3 | i= | Al 
0 1 
Scalar multiplication is compatible with matrix multiplication in the obvious 
sense: 
(2) c(AB) = (cA)B = A(cB). 


The proofs of these identities are straightforward and not very interesting. 

In contrast, the commutative law does not hold for matrix multiplication; that 
iS, 
la) AB # BA, usually. 


In fact, if A is an € X m matrix and B is an m X € matrix, so that AB and BA are both 
defined, then AB is € X € while BA is m X m. Even if both matrices are square, say 
m X m, the two products tend to be different. For instance, 


Palle tl- lo of Lo illo ol=[o of: 
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Since matrix multiplication is not commutative, care must be taken when 
working with matrix equations. We can multiply both sides of an equation B = C on 
the left by a matrix A, to conclude that AB = AC, provided that the products are 
defined. Similarly, if the products are defined, then we can conclude that BA = CA. 
We can not derive AB = CA from B = C! 

Any matrix all of whose entries are 0 is called a zero matrix and is denoted by 
0, though its size is arbitrary. Maybe Onxn would be better. | 

The entries a; of a matrix A are called its diagonal entries, and a matrix A 
is called a diagonal matrix if its only nonzero entries are diagonal entries. 

The ‘square n X n matrix whose only nonzero entries are | in each diagonal po- 
sition, 


0 1 . A / 

(1.14) In=|- aHoni Ce Con, 
: "Ye dace bo Ft i 
ee 1 


is called the n Xn identity matrix. It behaves like | in multiplication: If A is an 
m Xn matrix, then 


ImA = A and Al, =A. 
Here are some shorthand ways of drawing the matrix /,: 


1 0 1 


7 
II 
Hl 


0 1 1 
We often indicate that a whole region in a matrix consists of zeros by leaving it 


blank or by putting in a single 0. 
We will use * to indicate an arbitrary undetermined entry of a matrix. Thus 


may denote a square matrix whose entries below the diagonal are 0, the other entries 
being undetermined. Such a matrix is called an upper triangular matrix. 
Let A be a (square) n X n matrix. If there is a matrix B such that 


(1.15) AB =I, and BA = In, 

then B is called an inverse of A and is denoted by A’!: " at 

(1.16) Se CP we WP 

When A has an inverse, it is said to be an invertible matrix. For Pay the matrix 


| Peeeee....coae en a 
A= sak 1s invertidle. Its inverse 1s A = -5§ 2 5 BS IIS Seeiol by computing 


a 
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the products AA~' and A~'A. Two more examples are: 


Pali de P-L a 


We will see later that A is invertible if there is a matrix B such that either one of 
the two relations AB = /, or BA = I, holds, and that B is then the inverse [see 
(2.23)]. But since multiplication of matrices is not commutative, this fact is not obvi- 


ieee 


yor ous. fie for matrices which aren’t saute: for example, let A = [1 2] and let 

\ ( - gal Ee 

ony B= 0 . Then AB = [1] = 7,, but BA = lo 0 # Io. 

apf -- : On the other hand, an inverse is unique if it exists at all. In other words, there 
Fy) , can be only one inverse. Let B,B” be two matrices satisfying (1.15), for the same 


og , vik - matrix A. We need only know that AB = /, (B is a right inverse) and that B'A = I, 
be ph: 2B" is a left inverse). By the associative law, B’(AB) = (B'A)B. Thus 

i 
— sy * 


vy (1.17) 58 in (AB) (8 Re 


and so B’ = B.o 


(1.18) Proposition. Let A,B be n Xn matrices. If both are invertible, so is their 


product AB, and a 


ne 
(AB) ' = B'A™. 
More generally, if Ai,...,Am are invertible, then so is the product A: ---Am, and its 
inverse is Am ':+*A, | 


rmoiteinerest Il f= [a] PIE JE ad 


Proof. Assume that A,B are invertible. Then we check that B”'A™' is the in- 
verse of AB: 


Nie 
RI-NI— 


ABBA 241A | = AA =-1, 
and similarly 
B'A7'AB ooo = ff. 


The last assertion is proved by induction on m [see Appendix (2.3)]. When m = 1, 
the assertion is that if A, is invertible then A,_' is the inverse of A,, which is trivial. 
Next we assume that the assertion is true for m = k, and we proceed to check it for 
m = k + 1. We suppose that A1,...,Ax+1 are invertible n X n matrices, and we de- 
note by P the product A, --- Ax of the first k matrices. By the induction hypothesis, P 
is invertible, and its inverse is Ax '-**A, '. Also, Ag+1 is invertible. So, by what has 
been shown for two invertible matrices, the product PAx+; = A, **+AgAk+1 iS invert- 
ible, and its inverse is Ag+; 'P-' = Ax+: ‘Ak '++*A: '. This shows that the assertion is 
true for m = k + 1, which completes the induction proof. o 
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Though this isn’t clear from the definition of matrix multiplication, we will see 
that most square matrices are invertible. But finding the inverse explicitly is not a 
simple problem when n the matrix is large. 


<. The set of all invertible n x n matrices is called the n-dimensional general lin- 


ear ear group and is denoted by GLn. The general linear groups will be among our most 
important examples when we study the basic concept of a group in the next chapter. 

Various tricks simplify matrix multiplication in favorable cases. Block ¢ multipli- 
cation is one of them. Let m,m’ be m X n and n X p matrices, and let r be an integer 
‘Tess than n. We may decompose the two matrices into blocks as follows: 


=[A|B] and mM’ = | 


B’ 


where A has r columns and A’ has r rows. Then the matrix product can be computed 
as follows: 


A 
019) MM’ = AA’ + BB’. 


This decomposition of the product follows directly from the definition of multiplica- 
tion, and it may facilitate computation. For example, 


S olsfes)=[2 2 3) + [fo o=[E 3} 


0 0 


Note that formula (1.19) looks the same as rule (1.6) for multiplying a row 
vector and a column vector. 

We may also multiply matrices divided into more blocks. For our purposes, a 
decomposition into four blocks will be the most useful. In this case the rule for block 
multiplication is the same as for multiplication of 2 x 2 matrices. Let r + s = n and 
let k + € = m. Suppose we decompose an m X n matrix M and ann X p matrix M' 


into submatrices 
A|B , Ae | Be 
Ma 7, ; 
E | E ‘ =| 


where the number of columns of A is equal to the number of rows of A’. Then the 
rule for block multiplication is 


A420) E 2|(4 B'] _ | aa’ + BC’ | AB’ + BD’ 
, ileal (ROU al) 2) CA’ + Dc’ g Y 


CB + DD 


2 
1015 a 28161 
0 17) |e | a aera | 

iE lec 


For example, 
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' 23 
In this product, the upper left block is [1 a, + {5](0 1] =([2 8], etc. 


Again, this rule can be verified directly from the definition of matrix multipli- 
cation. In general, block multiplication can be used whenever two matrices are de- 
composed into submatrices in such a way that the necessary products are defined. 

Besides facilitating computations, block multiplication is a useful tool for prov- 
ing facts about matrices by induction. 


2. ROW REDUCTION 


—~ 


Let A = (aij) be an m Xn matrix, and consider a variable n X p matrix X = (xij). 
Then the matrix equation 


(238) Y = AX 

defines the m X p matrix Y = (y,) as a function of X. This operation is called Jeft 
multiplication by A: 

(2.2) Vn = Qixy + + GixXy- 


Notice that in formula (2.2) the entry yj depends only on x\;,..., %nj, that is, on the 
jth column of X and on the ith row of the matrix A. Thus A operates separately on 
each column of X, and we can understand the way A operates by considering its ac- 
tion on column vectors xX’ 


x) 
Xn Ym 


Left multiplication by A on column vectors can be thought of as a function 
from the space of n-dimensional column vectors X to the space of m- -dimensional 
column vec vectors ie Or ‘a colle¢tion of m fuiictions of n variables: ee 


Yi = AyxX, + °° + GinXn (i = 1,....7); 


It is called a linear transformation, because the functions are homogeneous and lin- 


ear. (A_linear fanction of a set of variables u;,..., ux is one of the form a,u,; + 
Te. + agux + c, where a),...,@x,c are scalars. Such a function is homogeneous lin- 


ear if the constant term c is zero.) 


onal A picture of the operation of the 2 x 2 matrix ; “| is shown below. It maps 


2-space to 2-space: p 
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(= bem nobis 


(2.3) Figure. 


Going back to the operation of A on an n X p matrix X, we can interpret the fact 
that A acts in the same way on each column of X as follows: Let ¥; denote the ith row 
of Y, which we view as a row vector: 


We can compute Y; in terms of the rows X; of X, in vector notation, as 
(2.4) Yj = aX: + ++ + GinXn. 


This is just a restatement of (2.2), and it is another example of block multiplication. 


a oe 


For example, the bottom row of the product 


can be computed as 3[1 0] + 4[4 2]— 6[3 2]=[5 4]. 
When A is a square matrix, we often speak of left multiplication t by A A as arow 
ee an ae ae 


operation. “ 
The simplest nonzero matrices are the ma matrix units, which we denote by ej: 


_— 


(25) ey=ife-le--: 
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This matrix e, has a 1 in the (/, /) position as its only nonzero entry. (We usually 
denote matrices by capital letters, but the use of a small letter for the matrix units is 
traditional.) Matrix units are useful because every matrix A = (ai) can be written 
out as a sum in the following way: 


A= ane@n + @i2€\2 +o? + Gnn€nn = yy Ajj ij. 
ty 


The indices i,j under the sigma mean that the sum is to be taken over all values of i 
and all values of j. For instance 


i Arak ial ek | 4] = 300 + 2e0 + tes + ten 


Such a sum is called a_linear combination of the matrices e;. 

The matrix units are convenient for the study of addition and scalar multiplica- 
uon of matrices. But to study matrix multiplication, some square matrices called e/e- 
mentary matrices are more useful. There are three types of elementary matrix: ~~ 


2 ie 
t . a . 
(2.61) 7 or an = 1 + ae; (i # jf). 
: r| a ; 
| ] 
Such a matrix has diagonal entries | and one nonzero off-diagonal entry. 
L 
LO d 
wie 0 
(2.611) ; - Se Cyt Ce ete = ey 
; I 0 


y 
Here the ith and jth diagonal entries of / are replaced by zero, and two 1’s are 


added in the (i, j/) and (j,i) positions. (The formula in terms of the matrix units is 
rather ugly, and we won’t use it much.) 


(2.6iii) 5 c = te Vey, Cena 
I 


One diagonal entry of the identity matrix is replaced by a nonzero number c. 
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The elementary 2 X 2 matrices are 


eo Pease Wa 0: eee ie 1 
“) lo deb ‘| m) ; il aio | | | 


where, as above, a is arbitrary and c is an arbitrary nonzero number. 
The elementary matrices E operate on a matrix X as described below. 


EN To get the matrix EX, you must: 


Type (i): Replace the ith row xX; by X; + ax;, or 
add a:(rowj) to (row i); 

Type (ii): /nterchange (row i) and (row J); 

Type (ii): Multiply (row!) by a nonzero scalar c. 


These operations are called elementary row operations. Thus multiplication by an el- 
ementary matrix is an elementary row operation. You should verify these rules of 
multiplication carefully. 


(2.8) Lemma. Elementary matrices are invertible, and their inverses are also ele- 
mentary matrices. 


The proof of this lemma is just a calculation. The inverse of an elementary ma- 
trix is the matrix corresponding to the inverse row operation: If E = / + ae, is of 
Type (i), then E7' = J — aej; “subtract a-(row j) from (row i)”. If E is of Type (ii), 
then E'' = E, and if E is of Type (iii), then E'' is of the same type, with c™' in the 


199 


position that c has in E; “multiply (rowi) by c'”. 5 
We will now study the effect of elementary row operations (2.7) on a matrix A, 
with the aim of ending up with a simpler matrix A’: 


sequence of operations —_, 


Since each elementary row operation is obtained as the result of multiplication by an 
elementary matrix, we can express the result of a succession of such operations as 
multiplication by a sequence £),..., £, of elementary matrices: 


(2.9) A’ = Ep ExE\A. 


This procedure is called row reduction, or Gaussian elimination. For example, we 


can simplify the matrix - a 
|) a eae) 
(2.10) Mi Bae 2 ae 
1 2534 12 


by using the first type of elementary operation to clear out as many entries as 
possible: 
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el 
a Se, 
(i Olas 18 i021 5 10.22 es ~~, 
i esse ee 15 1p les l|0 1 304 2 | eee 
128 4 12 tf 27394 1 027653 9 
10-22 tare 1 Omen 0 #2 
Cot 30 1 re 
Cm OOun lees OO g0miin 3 


Row reduction is a useful method of solving systems of linear equations, Sup- 
pose we are given a system of m 7 equations in n unknowns, say AX = B as in (1.9), 
where A is an m Xn matrix, X is an unknown column vector, and B is a given 
column vector. To solve this system, we form the m X (n + 1) block matrix 


ai *°* Qin| bd 
(2.11) M =[a[B) = | - ek 

Ga * + ee 
and we perform row operations to simplify M. Note that EM = [EA|EB]. Let 

= [a’|B'] 
be the result of a sequence of row operations. The key observation follows: 
(2.12) Proposition. The solutions of A’X = B’ are the same as those of AX = B. 

Proof. Since M' is obtained by a sequence of elementary row operations, 
M’ = E;**:E\M. 


Let P = E,--:E,. This matrix is invertible, by Lemma (2.8) and Proposition (1.18). 
Also, M’ = [A’|B'] = [PA| PB]. If X is a solution of the original system AX = B, 
then PAX = PB, which is to say, A’X = B’. So X also solves the new system. Con- 
versely, if A’X = B’, then AX = P''A’X = P''B' = B, so X solves the system 
AX = B too. o 


For example, consider the system 


areata 2x76 x, =er5 


ll 
~ 


(2.13) xX =F 23 ap 5x3 ate 2X4 
xX, + 2x + 8x, + 40, = 12. 


Its augmented matrix is the matrix M considered above (2.10), so our row reduction 
of this matrix shows that this system of equations is equivalent to 


x + 2x3 = 2 
X2 aF 3x3 — || 


“a 5. 
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We can read off the solutions of this system immediately: We may choose x3 arbitrar- 
ily and then solve for x,, x2. and x. The general solution of (2.13) can therefore be 
written in the form 


Dey = (635 28 = 1 Sele pe a | — 36. — 5) 


where c;3 is arbitrary. 
We now go back to row reduction of an arbitrary matrix. It is not hard to see 
that, by a sequence of row operations, any matrix A can be reduced to one which 


euler tennant 


looks s roughly like this: 


(2.14) A= 


where * denotes an arbitrary number and the large blank space consists of zeros. 


This is called a row echelon matrix. For instance, 
_ 


Sem 
oon 


0 
] 
0 


on = 


is a row echelon matrix. So is the end result of our reduction of (2.10). The 
definition of a row echelon matrix is given in (2.15): 


(2215) 


(a) The first nonzero entry in every row is |. This entry is called a pivot. 

(b) The first nonzero entry of row i + | is to the right of the first nonzero en- 
try of row i. 

(c) The entries above a pivot are zero. 


To make a row reduction, find the first column which contains a nonzero en- 
try. (If there is none, then A = 0, and 0 is a row echelon matrix.) Interchange rows 
using an elementary operation of Type (ii), moving a nonzero entry to the top row. 
Normalize this entry to | using an operation of Type (111). Then clear out the other 
entries in its column by a sequence of operations of Type (1). The resulting matrix 
will have the block form 
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We now continue, performing row operations on the smaller matrix D (cooking until 
done). Formally, this is induction on the size of the matrix. The principle of com- 
plete induction [see Appendix (2-6)] allows us to assume that every matrix with 
fewer rows than A can be reduced to row echelon form. Since D has fewer rows, we 
may assume that it can be reduced to a row echelon matrix, say D”. The row opera- 
tions we perform to reduce D to D” will not change the other blocks making up A’. 
Therefore A’ can be reduced to the matrix 


1 4 : 
"WW TA. 
D 


which satisfies requirements (2.15a and b) for a row echelon matrix. Therefore our 
original matrix A can be reduced to this form. The entries in B above the pivots of D” 
can be cleared out at this time, to finish the reduction to row echelon form. o 


It can be shown that the rc row echelon matrix obtained from a given matrix A by _ 
row reduction is unique, that is, that it does not “depend on the particular sequence of 
operations used. However, this is not a very important point, so we omit the proof. 
~The reason that row reduction is useful is that we can solve a system of equa- 


tions A’X = B’ immediately if A’ is in row echelon form. For example, suppose that 


i Geo 16 
[a’|e']=|0 01 2/0 
0.0 0704 


There is no solution to A’X = B’ because the third equation is 0 = 1. On the other 
hand, 


1601/1 
[’|a]=]0 0 1 243 
000 0/0 


has solutions. Choosing x2, x4 arbitrarily, we can solve the first equation for x, and 
the second for x3. This is the procedure we use to solve system (2.13). 
The general rule is as follows: 


(2.16) Proposition. Let M’ = [A’|B’] be a row echelon matrix. Then the system 
of equations A’X = B’ has a solution if and only if there is no pivot in the last 
column B’. In that case, an arbitrary value can be assigned to the unknown x; if 
column i does not contain a pivot. o 
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Of course every homogeneous linear system AX = 0 has the trivial solution 
X = 0. But looking at the row echelon form again, we can conclude that if there are 
more unknowns than equations then the homogeneous equation AX = 0 has a non- 
trivial solution for X : a 
——_—_ km” 
._A2.17) Corollary. Every system ax<0 of m homogeneous equations in n un- 
knowns, with m <n, has a solutioh X in which some x; is nonzero. 


yin oJ 


For, let A'X = 0 be the associated row echelon equation, and let r be the number of 
pivots of A’. Then r <= m. According to the proposition, we may assign arbitrary 
values to n — r variables x;. 5 


We will now use row reduction to characterize square invertible matrices. 


(2.18) Proposition. Let A be a square matrix. The following conditions are equiva- 
lent: 


(a) A can be reduced to the identity by a sequence of elementary row operations. 
(b) A is a product of elementary matrices. 
—y (c) A is invertible. 
(d) The system of homogeneous equations AX = 0 has only the trivial solution 
x=0. > aa 


Proof. We will prove this proposition by proving the implications (a)>(b)> 
(c)>(d)>(a). To show that (a) implies (b), suppose that A can be reduced to the 
identity by row operations: Ex, ---E,A = /. Multiplying both sides of this equation on 
the left by £,"' ---Ex"', we obtain A = E,"' --- Ex”'. Since the inverse of an elemen- 
tary matrix is elementary, this shows that A is a product of elementary matrices. Be- 
cause a product of elementary matrices is invertible, (b) implies (c). If A is invertible 
we can multiply both sides of the equation AX = 0 by A‘! to derive X = 0. So the 
equation AX = 0 has only the trivial solution. This shows that (c) implies (d). 

To prove the last implication, that (d) implies (a), we take a look at square row 
echelon matrices M. We note the following dichotomy: 


(519) Let M be a square row echelon matrix. 
Either M 1s the identity matrix, or its bottom row is zero. 


This is easy to see, from (2.15). 

Suppose that (a) does not hold for a given matrix A. Then A can be reduced by 
row operations to a matrix A’ whose bottom row is zero. In this case there are at 
most n—1 nontrivial equations in the linear system A'X = 0, and so Corollary (2.17) 
tells us that this system has a nontrivial solution. Since the equation AX = 0 
is equivalent to A’X = Q, it has a nontrivial solution as well. This shows that if (a) 
fails then (d) does too; hence (d) implies (a). This completes the proof of Proposition 
(2.18). 
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= 


4%.20) Corollary. If a row of a square matrix A is zero, then A is not invertible. o 
Row reduction provides a method of computing the inverse of an invertible 
matrix A: We reduce A to the identity by row operations: 
Ex eee E\A — fi 
as above. Multiplying both sides of this equation on the right by A ', we have 
Exe E\l = A. 
(2.21) Corollary. Let A be an invertible matrix. To compute its inverse A~', apply 


elementary row operations £),...,£x to A, reducing it to the identity matrix. The 
Same sequence of operations, when applied to /, yields A™'. 


The corollary is just a restatement of the two equations. c 


(2.22) Example. We seek the inverse of the matrix 


Es 


To compute it we form the 2 X 4 block matrix 
1 | 
eee 


5 4 
Tl = 
win=|2 § 
We perform row operations to reduce A to the identity, carrying the right side along, 
and thereby end up with A‘! on the right because of Corollary (2.21). 


[A| 4] = ie : | : | Subtract (row 1) from (row 2) 
—s F 4 al Subtract 4 + (row 2) from (row 1) 
—> I 4 ; | Subtract (row 1) from (row 2) 

Ie) Ovo 4 <4 
= {1A 
_ [ I | 26 | nie 


y=4 
=| as 
Thus AU = le Al 


(2.23) Proposition. Let A be a square matrix which has either a left inverse B: 
BA = 1. or aright inverse: AB = 7. Then A is invertible, and B is its inverse. 


Proof. Suppose that AB = /. We perform row reduction on A. According to 
(2.19). there are elementary matrices £),...,£% So that A’ = Ex... EA either is the 
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identity matrix or has bottom row zero. Then A’B = Ex... £,, which is an invertible 
matrix. Hence the bottom row of A’B is not zero, and it follows that A’ has a nonzero 
bottom row too. So A’ = /. By (2.18), A is invertible, and the equations 
I = Ex... E,A and AB = I show that A’! = Ex... E,; = B (see (1.17)). The other case 
is that BA = /. Then we can interchange A and B in the above argument and con- 
clude that B is invertible and A is its inverse. So A is invertible too. 5 


For most of this discussion, we could have worked with columns rather than 
rows. We chose to work with rows in order to apply the results to systems of linear 
equations; otherwise columns would have served just as well. Rows and columns are 
interchanged by the matrix transpose. The transpose of an m Xn matrix A is the 
n X m matrix A' obtained by reflecting about the diagonal: A' = (b;;), where 


bi = Aji. 


] 
3 aio 7 
[ A = | 1 and [1 2 3} = i 


The rules for computing with the transpose are given in (2.24): 


For instance, 


(2.24) 
(a) : (A + B)' = At + Bt, 
(b) pre ae (cA)' = cat 
no a 
(c) ae (AB)' = Bia"! 
(d) eae 


Using formulas (2.24c and d), we can deduce facts about right multiplication, 
XP, from the corresponding facts about left multiplication. 

The elementary matrices (2.6) act by right multiplication as the following ele- 
mentary column operations: 


(2225) 


(a) Add a - (column i) to (column j). 
(b) Interchange (column /) and (column /). 
(c) Multiply (column i) by c # 0. 


3. DETERMINANTS 


Every square matrix A has a number associated to it called its determinant. In this 
section we will define the determinant and derive some of its properties. The deter- 
minant of a matrix A will be denoted by det A. 


bo werprage wale 
i ies D OAA : 
1 Tp ndaikbcr ben 
ow” 
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’ Pct: ee “A tan be 
The determinant of a 1 < | matrix is just its unique entry ( ony CoA 
es A 
0) + a> 
(3.1) det [a] = a, Howe bets) wilt 
: / — J ak" cf 4h 
and the determinant of a 2 X 2 matrix is given by the formula =a 3 ae. 
Fi OM av (s : Y\ 
a b / cords Ve wt 
(3.2) det @ g\ 2 = Oe v, ee sopyv? 
pow Ape Oa 1 


“If we think of a 2X2 matrix _A_as an operator on the space R? of real two- 
dimensional vectors, as in Section 2, then det A can be interpreted geometrically. Its 
absolute value is the area of the parallelogram which forms the image of a unit 
‘square undér the Operation. For example, the area of the shaded region of Figure 
(2.3) is TO. The determinant i is positive or negative according to whether the orienta- 


tion of the square iS arene or reversed sdavia the agearcrr Moreover, “det A = 0 if 


raed 


‘only lv if the > two wo columns of A are sroporivonat = 
~The eal of all n Xn matrices forms a space of dimension n*, which we denote 
¥ by R" R’*". We will regard the determinant of n X n matrices as “al haRENOR from this 


space to =" ca numbers: 
det: R’*7—— R. 


al o@This just means that det is a function of the n* matrix entries. There is one such 
gee function for each positive integer n. Unfortunately there are many formulas for the 
determinant, and all of them are complicated when n is large. The determinant is 
important because it has very nice properties, though there is no simple formula for 
it. Not only are the formulas complicated, but it may not be easy to show directly 
that two of them define the same function. So we will use the following strategy: We 
choose one formula essentially at random and take it as the definition of the determi- 
nant. In that way we are talking about a particular function. We show that the func- 
tion we have chosen has certain very special properties. We also show that our cho- 
sen function is the only one having these properties. Then, to check that some other 
formula defines the same determinant, we have to check only that the function which 

it defines has these same properties. It turns out that this is usually relatively easy. 
The determinant of an n Xn matrix can be computed in terms of certain 
(n — 1) X (n — 1) determinants by a process called expansion by minors. This ex- 
pansion allows us to give a recursive definition of the determinant function. Let A be 


an n X n matrix and let Aj denote the (n — 1) X (n — 1) matrix obtained by crossing 
out the ith row and the jth column of A: 


(33) i NUL 
~ 


N 
N 


at ca 


Ke 
aia eo) dye 


' 
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For example, if 


ae Os 
A= 2. 2 A then Ay = : 
OS) 


Expansion by minors on the first column is the formula 
(3.4) det A = (in det Ait — @> det HOR See, = 88o BS hy det AWE 


The signs alternate. We take this formula, together with (3.1), as_a recursive 
definition of the determinant. Notice that the formula agrees with (3.2) tor 2x2 
matrices. 


The determinant of the matrix A shown above is 


| 2 0 3 ce 
= & —s . + 4 , 
det A | de ‘| 2 det? 1 0 de | | 


The three 2 x 2 determinants which appear here can be computed by expanding by: 
minors again and using (3.1), or by using (3.2), to get 


detA = |-(=9)"2-(-Is)-7 0- (= 3) = ZI. 


There are other formulas for the determinant. including expansions by minors on 
other columns and on rows, which we will derive presently [see (4.11, 5.1, 5.2)]. 
It is important, both for computation of determinants and for theoretical con- 
siderations, to know some of the many special properties satisfied by determinants. 
Most of them can be verified by direct computation and induction on n, using expan- 
sion by minors (3.4). We will list some without giving formal proofs. In order to be 
able to interpret these properties for functions other than the determinant, we will 


denote the determinant by the symbol d for the time being. 
va 


6) The function d(A) is linear in the rows of the matrix. 


By this we mean the following: Let R; denote the row vector which is the ith row of 
the matrix, so that A can be written symbolically as 


= RR, —— 


Rn 


By definition, /inearity in the ith row means that whenever R and S are row vectors 
then — ae 


@l|| =—/MqrGS— || = al am + d |——_S 
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and 


d 


cR = ¢d|——R— 


where the other rows of the matrices appearing in these relations are the same 
throughout. For example, 


2 4 1 2 4 2 4 
Genet 3 4 bea = det 3. 40a? | + detls. sos: 
2 = 0 Zo 0 2=1 0 
and 
] 2 4 12 4 
dee2-S 2:6 -2:33(0— 2°: del) Servos ae 
2 Vet 0 ZO 
Linearity allows us to operate on one row at a time, with the other rows left fixed. 
Another property: vA ' 
Lo any ‘ant pe) 
S87) If two adjacent rows of a matrix A are equal, then d(A) = 0. 


— 


Let us prove this fact by induction on n. Suppose that rows j and j + | are equal. 
Then the matrices A; defined by (3.3) also have two rows equal, except when i = j 
ori = j + 1. When A;; has two equal rows, its determinant is zero by induction. 
Thus only two terms of (3.4) are different from zero, and 


d(A) = taj d(Aj) F aj+i1d(Ajsi 1). 


Moreover, since the rows R; and Rj+; are equal, it follows that Aj) = Aj+11 and that 
aj), = aj+11. Since the signs alternate, the two terms on the right side cancel, and 
the determinant is zero. 

Properties (3.5—3.7) characterize determinants uniquely [see (3.14)], and we 


Se 


i \O : ——— rs ats 
Te will derive further relations from them without going back to definition (3.4). 


4 wae 
> Y cb 158) If a multiple of one row is added to an adjacent row, 
¥ the determinant is unchanged. - 
o/ 
For, by (3.6) and (3.7), 
d : | d | % cd | ‘ =d ® 
—— $+cR— S R & 


Cy (3:95 
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The same reasoning works if s is above R. 


k Wes 9) If two adjacent rows are interchanged, 
the determinant is multiplied by —1. 


We apply (3.8) repeatedly: 


: mee —— (s — R) —— a SS = 8) 
s s i 
= = — -d 
© |—6-9—| ~* |= a ; 
Je 7’) if two rows of a matrix A are equal, then d(A) = 0. 


For, interchanging adjacent rows a few times results in a matrix A’ with two adjacent 
rows equal. By (3.7) d(A’) = 0, and by (3.9) d(A) = +det(A’). 
Using (3.7’), the proofs of (3.8) and (3.9) show the following: 


(3.8') If a multiple of one row is added to another row, 
ip 
the determinant is not changed. 


(3.9’) If two rows are interchanged, 
the determinant is multiplied by —\. 


Also, (3.6) implies the following: 
(3.10) If a row of A is zero, then d(A) = 0. 


If a row is zero, then A doesn’t change when we multiply that row by 0. But accord- 
ing to (3.6), d(A) gets multiplied by 0. Thus d(A) = Od(A) = 0. 

Rules (3.8’), (3.9'), and (3.6) describe the effect of an elementary row opera- 
tion (2.7) on the determinant, so they can be rewritten in terms of the elementary 
matrices. They tell us that d(£A) = d(A) if E is an elementary matrix of the first 
kind, that d(£EA) = —d(A) if E is of the second kind, and (3.6) that d(EA) = cd(A) if 
E is of the third kind. Let us apply these rules to compute d(£) when E is an ele- 
mentary matrix. We substitute A = /. Then, since d(/) = 1, the rules determine 
d(E/) = d(E): 
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(3.11) The determinant of an elementary. matrix is: 


(i) First kind (add a multiple of one row to another): d(E) = 1, by (3.8’). 
(ii) Second kind (row interchange): d(E) = -1, by (3.9’). 
(ili) Third kind (multiply a row by a nonzero constant): d(E) = c, by (3.6). 


Moreover, if we use rules (3.8’), (3.9'), and (3.6) again, applying them this time to 
an arbitrary matrix A and using the values for d(£) which have just been determined, 
we obtain the following: 
lbs 12) Let E be an elementary matrix and let A be arbitrary. Then 
d(EA) = d(E)d(A). 
Recall from (2.19) that every square matrix A can be reduced by elementary 


row Operations to a matrix A’ which is either the identity / or else has its bottom row 
Zero: 


A’ = Eg: Ei A. 


We know by (3.5) and (3.10) that d(A)’ = 1 or d(4’) = 0 according to the case. By 
(3.12) and induction, 


(3.13) d(A’) = d(Ex) «++ d(E; )d(A). 


We also know d(E;), by (3.11), and hence we can use this formula to compute d(A). 


wa 4) Theorem. Axiomatic Characterization of the Determinant: The determinant 
function (3.4) is the only one satisfying rules (3.53.7). 
Proof. We used only these rules to arrive at equations (3.11) and (3.13), and 


they determine d(A). Since the expansion by minors (3.4) satisfies (3.5-3.7), it 
agrees with (3.13). o 


We will now return to our usual notation det A for the determinant of a matrix. 
\ fB5) Corollary. A square matrix A is invertible if and only if det A # 0. 
This follows from formulas (3.11), (3.13), and (2.18). By (3.11), det £; # 0 for all 
i. Thus if A’ is as in (3.13), then det A # 0 if and only if det A’ # 0, which is the 
case if and only if A’ = /. By (2.18), A’ = / if and only if A is invertible. o 


We can now prove one of the most important properties of the determinant 
function: its compatibility with matrix multiplications. 


(3.16) Theorem. Let A,B be any two n X n matrices. Then 
det(AB) = (det A)(det B). 
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Proof. We note that this is (3.12) if £ is an elementary matrix. 


Case 1: A is invertible. By (2.]8b), A is a product of elementary matrices: 
A = E,:*: Ex. By (3.12) and induction, det A = (det #,)--- (det Ex), and det AB = 
det(E -:-E,B) = (det E,) +++ (det £x)(det B) = (det A)(det B). 


Case 2: A is not invertible. Then det A = 0 by (3.15). and so the theorem will fol- 
low in this case if we show that det(AB) = 0 too. By (2.18), A can be reduced to a 
matrix A’ = Ex-: E,A having bottom row zero. Then the bottom row of A’B is also 
zero; hence 


O = det(A’B) = det(Ex, ++: E,AB) = (det £x) ++: (det E;)(det AB). 
Since det £; # 0, it follows that det AB = 0. 5 


1 


\ (317) Corollary. If A is invertible, det(A”') = roe 
Proof. (det A)(det A7') = det 7 = l.o 


Note. It is a natural idea to try to define determinants using rules (3.11) and 
(3.16). These rules certainly determine det A for every invertible matrix A, since we 
can write such a matrix as a product of elementary matrices. But there is a problem. 
Namely, there are many ways to write a given matrix as a product of elementary 
matrices. Without going through some steps as we have, it is not clear that two such 
products would give the same answer for the determinant. It is actually not particu- 
larly easy to make this idea work. 


The proof of the following proposition is a good exercise. 


\A3.18) Proposition. Let A‘ denote the transpose of A. Then 
det A = deta’. o 


(3.19) Corollary. Properties (3.6—3.10) continue to hold if the word row is re- 
placed by column throughout. 5 


4. PERMUTATION MATRICES 


A bijective map p from a set S to itself is called a permutation of the set: 
(4.1) p: S—S. 
For example, 
lw 3 
(4.2) 2~w 1 
3m 2 


oo 
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is a permutation of the set (1, 2. 3}. It is called a cyclic permutation because it oper- 


ates as 
<a 


saa ew 


There are several notations for permutations. We will use function notation in 
this section, so that p(x) denotes the value of the permutation p on the element x. 
Thus if p is the permutation given in (4.2), then 


p(1) = 3, p(2) = 1, p(3) = 


A permuiation matrix P is a matrix with the following property: The operation 
of left multiplication by P is a permutation of the rows of a matrix. The elementary 
matrices of the second type (2.6ii) are the simplest examples. They correspond to 
the permutations called transpositions, which interchange two rows of a matrix, 
leaving the others alone. Also. 


® i @ 
(4.3) P=|]0 0 1 
| eee 
is a permutation matrix. It acts on a column vector X = (x, y, z)' as 
Ls i OW es y 
px=10 0 Tihy]l= tz 


om Oe x 


The entry in the first position is sent to the third position, and so on, so P has per- 
muted rows according to the cyclic permutation p given in (4.2). 


oe There is one point which can cause contusion and which makes it important 
vu “tor us to establish our notation carefully. When we permute the entries)of a vector 
ee Xt... Xn)' according to a permutation p. the indices are permuted in the opposite 


way. For instance. multiplying the column vector X = (x), x2,x3)' by the matrix in 
(4.3) gives toon 


4 
(4.4) : om) PX=|0 O 1 3G) |=" || 2h lie Ay pe 
(4 2, 


The indices in By are permuted by Lows 2s 3~~—~ 1, which is the inverse of 
the permutation pl Thus there are two ways to associate a permutation'to a permuta- 
tion matrix P: the permutation p which describes how P permutes the entries, and the 
inverse operation which describes the effect on indices. We must make a decision, so 
we will say that the permutation associated to P is the one which describes its action 
on the entries of a column vector. Then the indices are permuted in the opposite 
way, SO 


Xp Vy wf aN 


(4.5) Ree 


Xp Vn) 


, aan 


Mo 


cv ie 


/ 


ie 


i 7 
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Multiplication by P has the corresponding effect on the rows of an n X r matrix A. 

The permutation matrix P can be written conveniently in terms of the matrix 
units (2.5) or in terms of certain column vectors called the standard basis and de- 
noted by e;. The vector e; has a | in the ith position as its single nonzero entry, so 
these vectors are the matrix units for an n X | matrix. 


(4.6) Proposition. Let P be the permutation matrix associated to a permutation p. 


(a) The jth column of P is the column vector épj). 


(b) P is a sum of n matrix units: P = @pay. + + + epi = Dery: o 
a 


the rest of its entries being 0. Conversely, any such matrix is a permutation matrix. 


[ A permutation matrix } P always has a single 1 in each row and in each column, 


(4.7) Proposition. 


(a) Let p,q be two permutations, with associated permutation matrices P,Q. Then 
the matrix associated to the permutation pq is the product PQ. 


__46) A permutation matrix P is invertible, and its inverse is the transpose matrix: 
P7 1 Pt. 


Proof. By pq we mean the composition of the two permutations — 
(4.8) pq(i) = p(q(i). 


Since P operates by permuting rows according to p and Q operates by permuting ac- 
cording to q, the associative law for matrix multiplication tells us that PQ permutes 
according to pq: 


(PQ)X = P(Qx). 


Thus PQ is the permutation matrix associated to pg. This proves (a). We leave the 
proof of (b) as an exercise. o 


The determinant of a permutation matrix is easily seen to be +1, using rule 
(3.9). This determinant is called the sign of a permutation: 
iid st! Salt sehabaeati 


yore OM . 
ONS ee C4.9)—~ sign p = det P = +1. 


yarn > ym The ee (4.2) has sign +1, while any transposition has sign -1 [see 


fa. 
4 


a 


rN) o re ae »} (3.11ii)]. A permutation p is called odd or even according to whether its sign is — 1 
: % ORCI 
er ‘op 
hn -~. Let us now go back to an arbitrary n X n matrix A and use linearity of the de- 


ee) ‘pve? : » terminant (3.6) to expand det A. We begin by working on the first row. Applying 
asa (3.6), we find that 
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det A = det ; + det : + +++ + det 
Ri — ——= |, = 


We continue expanding each of these determinants on the second row, and so on. 
When we are finished, det A is expressed as a sum of many terms, each of which is 
the determinant of a matrix M having only one entry left in each row: 


q\? 


Qn? 


Many of these determinants will be zero because a whole column vanishes. Thus the 
determinant of a 2 X 2 matrix is the sum of four terms: 


a b a 0 Ob 
ce ¢ ‘| = det "| + del” i 
ar 0 a O 0 b 0 b 
= > : 
de | + del * "| def ° A + del? ‘| 


But the first and fourth terms are zero; therefore 


a b a 0 0 b 
det ¢ = cer * O] + eel? Al 


In fact, the matrices M having no column zero must have one entry aj left in each 
row and each column. They are like permutation matrices P, except that the 1’s 
in P are replaced by the entries of A: 


(4.10) P= Dd enjj, M = y Ap jjEplj)j- 
j j 


By linearity of the determinant (3.6), 
det M = (apcyt gee Ap(n)n (det P) 
= (sign p)(apiini *** Apinyn). 


There is one such term for each permutation p. This leads to the formula 


(4.11) det A = > (sign p)ap(i)1 *** Gpin)1, 


perm p 
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where the sum is over all permutations of the set {1,...,m}. It seems slightly nicer to 
write this formula in its transposed torm: 


\ (42) det A = > (sign P)aiprr) *** Anpi(n)- 
perm p y A 


This is called the conmiplete expansion of the determinant. 
For example, the complete expansion of the determinant pf a 3 < 3 matrix has 
/ 


six terms: 
| a a 
(CO ee Ce © AVC] Bin | ae ja a 
Yy A 


= + \ ; 
(4.13) det Jaa, G2 an ( a = \ ie 2 Oe 
~Be | zs , ‘ C a 
¢] 13> 133 - a 1 
Ay Ap = ¢ * vi ed LA 


+ % 
= , ' 
= Addy + Apandy + Anu Gx — G1:A78A32 *\2021033 — A\3022431. 


The complete expansion is more of theoretical than of practical importance, 
because it has too many terms to be useful for computation unless 7 is small. Its the- 
oretical importance comes from the fact that determinants are exhibited as polyno- 
“mials in ‘the n° variable matrix entries a,. with c coefficients +1. This has important 
‘consequences. “Suppose. “for example. that each matrix entry aj is a differentiable 


function of a single variable: a, = «,,(t). Then det A is also a differentiable function 
of t, because sums and products of differentiable functions are differentiable. 


5. CRAMER’S RULE 


The name Cramer's Rule is applied to a group of formulas giving solutions of sys- 
tems of linear equations in terms of determinants. To derive these formulas we need 
to use expansion by minors on columns other than the first one, as well as on rows. 


(5.1) Expansion by minors on the jth column: 


det A = (—1)/ "lai, det Ay + (1)! “ay det Any + = 4 (—1)? “ape 


(5.2) Expansion by minors on the ith row: 
det A = (—1)'*'aj, det Aj + (—1)!*? aay det Ai2 + ++ + (-1)'*" ain det Ain. 


In these formulas Aj is the matrix (3.3). The terms (~1)'"’ provide alternating signs 
depending on the position (7, j) in the matrix. (1 doubt that such tricky notation is re- 
ally helpful, but it has become customary.) The signs can be read off of the follow- 
ing figure: 
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To prove (5.1), one can proceed in either of two ways: 


(a) Verify properties (3.5—3.7) for (5.1) directly and apply Theorem (3.14), or 
(b) Interchange (column j) with (column 1) and apply (3.9’) and (3.19). 


We omit these verifications. Once (5.1) is proved, (5.2) can be derived from it by 
transposing the matrix and applying (3.18). 


(5.4) Definition. Let A be an n Xn matrix. The adjoint of A is the n X n matrix 
whose (i, j) entry (adj) is (—1)'"7 det Aj = aj, where Aj is the matrix obtained by 
crossing out the /th row and the jth column, as in (3.3): 


(adj A) = (a,,)'. 


where aj; = (-1)'"’ det Ay. Thus 


aoe dd oP 
(55) ai ‘| = Ge | 


and 
ey <2 ae 4=2 =3 
(5.6) ai 2 bl = 2a 1) = OS es 
ia 2 Sealy 2 nas i ae 


We can now proceed to derive the formula called Cramer's Rule. 
(5.7) Theorem. Lct 6 = det A. Then 
(adj A)‘-A = 6/7, and A-tad} A) = 67. 


Note that in these equations 


es) 
ae 
l| 
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(5.8) Corollary. Suppose that the determinant 6 of A is not zero. Then 


For example, the inverse of the 2 x 2 matrix ki A is 


_ [ a-b 
N= alll 


The determinant of the 3 X 3 matrix whose adjoint is computed in (5.6) happens to 
be 1; therefore for that matrix, A~' = adj A. 
The proof of Theorem (5.7) is easy. The (i, j) entry of (adj A) : A is 


(5.9) (adj)iayy + s+: + (adj)in@nj = Quay + +++ + AniGn;. 


If i = j, this is formula (5.1) for 6, which is the required answer. Suppose i # j. 
Consider the matrix B obtained by replacing (column i) by (column j) in the matrix 
A. So (column j) appears twice in the matrix B. Then (5.9) is expansion by minors 
for B on its ith column. But det B = 0 by (3.7’) and (3.19). So (5.9) is zero, as re- 
quired. The second equation of Theorem (5.7) is proved similarly. o 


Formula (5.8) can be used to write the solution of a system of linear equations 
AX = B, where A is an n Xn matrix in a compact form, provided that det A # 0. 
Multiplying both sides by A7', we obtain 


1 
(5.10) X=A'B=-— 
6 
where 6 = det A. The product on the right can be expanded out to obtain the for- 
mula 
(5.11) xj = (b, ay; eye eke bnQnj). 
where aj; = +det Aj as above. 

Notice that the main term (b,a1; + +++ + bna@,j) on the right side of (5.11) 
looks like the expansion of the determinant by minors on the jth column, except that 
b; has replaced a,j. We can incorporate this observation to get another expression for 
the solution of the system of equations. Let us form a new matrix M,. replacing the 


jth column of A by the column vector B. Expansion by minors on the jth column 
shows that 


det Mj = (b\ ay; ar ee Sr bnQnj). 
This gives us the tricky formula 


_ det M; 
det A ~ 


(5.12) xj 


Chapter 1 Exercises 31 


For some reason it is popular to write the solution of the system of equations AX = B 
in this form, and it is often this form that is called Cramer’s Rule. However, this ex- 
pression does not simplify computation. The main thing to remember is expression 
(5.8) for the inverse of a matrix in terms of its adjoint; the other formulas follow 
from this expression. 

As with the complete expansion of the determinant (4.10), formulas 
(5.8—5.11) have theoretical as well as practical significance, because the answers A’! 
and X are exhibited explicitly as quotients of polynomials in the variables {a;;,bi}, 
with integer coefficients. If, for instance, aj and 6; are all continuous functions of ¢, 
so are the solutions x;. 


A general algebraical determinant in its developed form 

may be likened to a mixture of liquids seemingly homogeneous, 

but which, being of differing boiling points, admit of being separated 
by the process of fractional distillation. 


James Joseph Syivester 


EXERCISES 
1. The Basic Operations 


ies 
1. What are the entries a2; and a2; of the matrix | 2 7 8]? 
09 4 


2. Compute the products AB and BA for the following values of A and B. 


-8 -4 
@ a= |; A te = 9 5 


ees ee! 3-2 

Siow F164 

a | al 7 E | 

()A=[-1|,e=[1 2 1] 

0) 
by 
3. Let A = (a,...,@n) be a row vector, and let B = | . | be a column vector. Compute 

the products AB and BA. bn 


4. Verify the associative law for the matrix product 


| 
bes | 
° 3 
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. Compute the product ' al | 
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Chapter 1 


Noiice that this is a self-checking problem. You have to multiply correctly, or it won't 
come out. If you need more practice in matrix multiplication, use this problem as a 


model. 


. Compute |! 1 : 


1 
Find a formula for 1 1] , and prove it by induction. 
I 


. Compute the following matrix products by block multiplication: 


. Prove rule (1.20) for block multiplication. 
10. 


il. 


12. 


Re 


14. 
15. 
16. 


17. 


Let A,B be square matrices. 
(a) When is (A + B)(A — B) = A? — B?? 
(b) Expand (A + B)*. 


Let D be the diagonal matrix 
d 
d, 


i 


and let A = (aj;) be any n X n matrix. 
(a) Compute the products DA and AD. 


Omi 2 
Obed 
3 | 0 1 
dp, 


(b) Compute the product of two diagonal matrices. 


(c) When is a diagonal matrix invertible? 


An n Xn matrix is called upper triangular if aj = 0 whenever i > j. Prove that the 
product of two upper triangular matrices is upper triangular. 


In each case, find all real 2 x 2 matrices which commute with the given matrix. 


aff) of off) af 


3 
| 


|| 


oe) 
0 6 


Prove the properties 0 + A = A, OA = 0, and AO = 0 of zero matrices. 
Prove that a matrix which has a row of zeros is not invertible. 
A square matrix A is called nilpotent if AX = 0 for some k > 0. Prove that if A is nilpo- 


tent, then / + A is invertible. 


(a) Find infinitely many matrices B such that BA = 


2 
Al 
2 


(b) Prove that there is no matrix C such that AC 


mtu tw 


II 


72 when 


TX: 
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18. 


1S: 


20. 


Wnite out the proof of Proposition (1.18) carefully, using the associative law to expand 
the product (4B8)(B°'A"'). 


The trace of a square matrix is the sum of its diagonal entries: 
tiwA =saiieeiase-+ “seastwanne 
(a) Show that tr (4 + 8) = traA + trB, and that tr AB = tr BA. 
(b) Show that if B is invertible, then tr A = tr BAB™'. 
Show that the equation AB — BA = / has no solutions in n X n matrices with real entries. 


2. Row Reduction 


. (a) For the reduction of the matrix M (2.10) given in the text, determine the elementary 


matrices corresponding to each operation. 
(b) Compute the product P of these elementary matrices and verify that PM is indeed the 
end result. 


. Find all solutions of the system of equations AX = B when 


|i ae) a 
A=|3 0 0 4 
base 
and B has the following value: 


0 ! 0 
(a) |}O}] (by; 1] |] 2 
0 0 2 


. Find all solutions of the equation x; + x. + 2x3 — x4 = 3. 
. Determine the elementary matrices which are used in the row reduction in Example 


(2.22) and verify that their product is A™'. 


. Find inverses of the following matrices: 


Mob kb PE ab E lb “TE 2} 


. Make a sketch showing the effect of multiplication by the matrix A = 3 a on the 


plane R’. 


. How much can a matrix be simplified if both row and column operations are allowed? 
. (a) Compute the matrix product ejexe. 


(b) Write the identity matrix as a sum of matrix units. 
(c) Let A be any n X n matrix. Compute ej;Ae;j. 
(d) Compute eA and Ae;. 


. Prove rules (2.7) for the operations of elementary matrices. 
. Let A be a square matrix. Prove that there is a set of elementary matrices E),..., Ex 


such that £,-:: £,A either is the identity or has its bottom row zero. 


. Prove that every invertible 2 x 2 matrix is a product of at most four elementary matrices. 


Prove that if a product AB of n X n matrices is invertible then so are the factors A, B. 


A matrix A is called symmetric if A =A‘. Prove that for any matrix A, the matrix AA‘ is 
symmetric and that if A is a square matrix then A + A‘ is symmetric. 


=). 
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. (a) Prove that (AB)' = B'A' and that A‘ = A. 


(b) Prove that if A is invertible then (4~')' = (a')"!. 


. Prove that the inverse of an invertible symmetric matrix is also symmetric. 
. Let A and B be symmetric n X n matrices. Prove that the product AB is symmetric if and 


only if AB = BA. 


. Let A be an n X n matrix. Prove that the operator “left multiplication by A” determines A 


in the following sense: If AX = BX for very column vector X, then A = B. 


. Consider an arbitrary system of linear equations AX = B where A and B have real entries. 


(a) Prove that if the system of equations AX = B has more than one solution then it has 
infinitely many. 

(b) Prove that if there is a solution in the complex numbers then there is also a real solu- 
tion. 


Prove that the reduced row echelon form obtained by row reduction of a matrix A is 
uniquely determined by A. 


3. Determinants 


1. Evaluate the following determinants: 
701 Pro 050 
l i | ee | 5.2, 00 
3 [Saree w {1 t] (c) eer te ls 6 3 6 
09 7 4 
14 13 
Zao 2 0 
© i, 100 
2 10m 0 “0 
tee 220 my 5S 
eae ey ees) res 7% 
2. Prove that det ee ee ae det 002 11° 
Ane D ed AS 2441 4 


. Verify the rule det AB = (det A)(det 8) for the matrices A = [; a B= : 7h Note 


that this is a self-checking problem. It can be used as a model for practice in computing 
determinants. 


. Compute the ini of the Britney n X n matrices by induction on n. 


=) 
2-1 
(a) -1 2-1 
—] oe 
2-1 
l -1 2 
3 oo 8 
3 
3 


Lez 
Pies 
3m 3 


5. Evaluate det |” 
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*6. Compute det 


=D = 


1 2 


7. Prove that the determinant is linear in the rows of a matrix, as asserted in (3.6). 
8. Let A be an n X n matrix. What is det (—A)? 
9. Prove that det At = det A. 


b 
d 


11. Let A and B be square matrices. Prove that det(AB) = det(BA). 


12. Prove that det 4 . = (det A)(det D), if A and D are square blocks. 


10. Derive the formula de ¢ | = ad — bc from the properties (3.5, 3.6, 3.7, 3.9). 


CHD 
matrix. Suppose that A is invertible and that AC = CA. Prove that det M = det( AD — CB). 
Give an example to show that this formula need not hold when AC # CA. 


*13. Let a 2n X 2n matrix be given in the form M = f ar where each block is ann Xn 


4. Permutation Matrices 


1. Consider the permutation p defined by l~~s 3, 2~~r 1, ow 4, ew». 
(a) Find the associated permutation matrix P. 
(b) Write p as a product of transpositions and evaluate the corresponding matrix product. 
(c) Compute the sign of p. 
2. Prove that every permutation matrix is a product of transpositions. 
3. Prove that every matrix with a single | in each row and a single | in each column, the 
other entries being zero, is a permutation matrix. 
. Let p be a permutation. Prove that signp = signp''. 
. Prove that the transpose of a permutation matrix P is its inverse. 
. What is the permutation matrix associated to the permutation imm~>n-i? 
. (a) The complete expansion for the determinant of a 3 X 3 matrix consists of six triple 
products of matrix entries, with sign. Learn which they are. 
(b) Compute the determinant of the following matrices using the complete expansion, 
and check your work by another method: 


"nr Se 


a a 


ale=jl a li @ 
lee tee |e ede) | 
1-1 1 rele 


I 
4 
2 


— WN bt 


oN 


8. Prove that the complete expansion (4.12) defines the determinant by verifying rules 
(3.5—3.7). 
9, Prove that formulas (4.11) and (4.12) define the same number. 
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5. Cramer’s Rule 


1. Let \¢ A] be a matrix with determinant |. What is A7'? 
12 1 1-2 =a 
2. (self-checking) Compute the adjoints of the matrices E ak 2 AS Za ol ee 
02 1 = al 


b 
and} 1 O 1 J, and verify Theorem (5.7) for them. 
l 


3. Let A be ann X n matrix with integer entries aj. Prove that A/' has integer entries if and 
only if detA = +1. 


4. Prove that expansion by minors on a row of a matrix defines the determinant function. 
Miscellaneous Problems 


oer fille Pe : 
1. Write the matrix E il as a product of elementary matrices, using as few as vou can. 


Prove that your expression is as short as possible. 


2. Find a representation of the complex numbers by real 2 X 2 matrices which 1s compatible 
with addition and multiplication. Begin by finding a nice solution to the matrix equation 


A = ~/, 
1 1 ] 
3. (Vandermonde determinant) (a) Prove thatdet!a 6b c | = (b — a)\(c — a\(c — b). 
——————————————— a2 eee 


*(b) Prove an analogous formula for n X n matrices by using row operations to clear out 
the first column cleverly. 
*4, Consider a general system AX = B of m linear equations in n unknowns. If the coefficient 
matrix A has a left inverse A’, a matrix such that A’A = /, then we may try to solve the 
system as follows: 


AX =B 
A'AX = A'B 
Xe == AB. 


But when we try to check our work by running the solution backward, we get into 


trouble: 
xX = A’B 
AX = AA'B 
EVE fe 1. 


We seem to want 4‘ to be a right inverse: AA’ = /, which isn't what was given. Explain. 
(Hint: Work out some examples. ) 
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5: 


*6. 


de 


*8. 


(a) Let A be a real 2 X 2 matrix, and let A), A> be the rows of A. Let P be the parallelo- 
gram whose vertices are 0,A;.A2,Ai + A>. Prove that the area of P is the absolute 
value of the determinant det A by comparing the effect of an elementary row opera- 
tion on the area and on det A. 

*(b) Prove an analogous result for n X n matrices. 


Most invertible matrices can be written as a product A = LU of a lower triangular matrix 

L and an upper triangular matrix U, where in addition all diagonal entries of U are 1. 

(a) Prove uniqueness, that is, prove that there is at most one way to write A as a product. 

(b) Explain how to compute L and U when the matrix A Is given. 

(c) Show that every invertible matrix can be written as a product LPU, where L.U are as 
above and P is a permutation matrix. 

Consider a system of m linear equations in n unknowns: AX = B, where A and B 

have integer entries. Prove or disprove the following. 

(a) The system has a rational solution if det A # 0. 

(b) If the system has a rational solution, then it also has an integer solution. 

Let A,B be m Xn and n Xm matrices. Prove that /», — AB is invertible if and only if 

In — BA is invertible. 


Chapter 2 


Groups 


Il est peu de notions en mathématiques qui soient plus primitives 
que celle de loi de composition. 


Nicolas Bourbaki 


I. THE DEFINITION OF A GROUP 


In this chapter we study one of the most important algebraic concepts, that of a 
group A group is a set on which a law of composition is defined, such that ail ele- 
ments have inverses. The precise definition is given below in (1.10). For example, 
the set of nonzero real numbers forms a group R* under multiplication, and the set 
of all real numbers forms a group R* under addition. The set of invertible n X 7 
matrices, called the general linear group, is a very important example in which the 
law of composition is matrix multiplication. We will see many more examples as we 
go along. 

By a law of composition on a set S, we mean a rule for combining pairs a, b of 
elements S to get another ‘element, say p, of S. The original models for this notion 
are addition and multiplication of real numbers. Formally, a law of composition is a 
function of two variables on S, with values in S, or it is a map 


ey Se 
a, bw p. 


Here, S X S denotes, as always, the product set of pairs (a, b) of elements of S. 

Functional notation p = f(a, b) isn’t very convenient for laws of composition. 
Instead, the element obtained by applying the law to a pair (a, b) is usually denoted 
using a notation resembling those used for multiplication or addition: 


p = ab,aXb,a°b,a + b, and so on, 


a choice being made for the particular law in question. We call the element p the 
product or sum of a and b, depending on the notation chosen. 
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Our first example of a law of composition, and one of the two main examples, 
is matrix multiplication on the set S of n X n matrices. 

e will usé the product notation ab most frequently. Anything we prove with 
product notation can be rewritten using another notation, such as addition. It will 
continue to be valid, because the rewriting is just a change of notation. 

It is important to note that the symbol ab is a notation for a certain element of 
S. Namely, it is the element obtained by applying the given law of composition to 
the elements called a and,b: Thus if the law is multiplication of matrices and if 


ae 


product ab has been evaluated, the elements a and b can not be recovered from it. 
Let us consider a law of composition written multiplicatively as ab. It will be 
called associative if the rule 


a= i | and b = [ a then ab denotes the matrix b |} Once the 


(1.1) (ab)c = a(bc) (associative law) 
holds for all a,b,c in S, and commutative if 
(le?) ab = ba_ (commutative law) 


holds for all a,b in S. Our example of matrix multiplication is associative but not 
commutative. 

When discussing groups in general, we will use multiplicative notation. It is 
customary to reserve additive notation a + b for commutative laws of composition, 
that is, when a + b = b + a for all a,b. Multiplicative notation carries no implica- 
tion either way concerning commutativity. 

In additive notation the associative law is (a + b) + c= a+ (b +c), and in 
functional notation it is 


f(f(a, b),c) = f (a, flb,c)). 


This ugly formula illustrates the fact that functional notation isn’t convenient for al- 


gebraic manipulation. 

The associative law is more fundamental than the commutative law; one reason 
for this Is that composition of functions, our second example of a law of composi- 
tion, is associative. Let T be a set, and let g, f be functions (or maps) from T to T. 


Let g ° f denote the composed map t~~» g(f(t)). The rule 
8,.f somamngae f 


is a law of composition on the set S = Maps(7, 7) of all maps T—>T. 
As is true for matrix multiplication, composition of functions is an associative 


law. For if f, g, are three maps from T to itself, then (he g)° f = =hetlge py: 


— 


hog 
ee 
T — T—T— T. 


Sau 
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This is clear, since both of the composed maps send t»~~ h(g(f(t))). 
The simplest example is that T is a set of two elements {a, 5}. Then there are 


four maps T—>T: 


i: the identity map, defined by i(a) = a, i(b) = b; 
7: the transposition, defined by 7 (a) = b, 7(b) = a; 
a: the constant function a (a) = a(b) = a; 

B: the constant function B(a) = B(b) = b. 


The law of composition on S can be exhibited in a multiplication table as follows: 
(1.3) 


which is to be read in this way: 


Thus 7° a = B, while a ° 7 = a. Composition of functions is not commutative. 
Going back to a aun law of composition, suppose we want to define the 
product of a string of n elements of a set: 


Q;A2°*" An == 7} 


There are various ways to do this using the given law, which tells us how to multiply 
two elements. For instance, we could first use the law to find the product a,a2, then 
multiply this element by a3, and so on: 


((4a2)a3)aa + 


When n = 4, there are four other ways to combine the same elements; (a; a2)(a3aa) 
is one of them. It can be proved by induction that if the law is associative, then all 


such products are equal. This allows us to speak of the product of an arbitrary string 
of elements. 


(1.4) Proposition. Suppose an associative law of composition is given on a set S. 
There is @ unique) way to define, for every integer n, a product of n elements 
a),...,an of S (we denote it temporarily by [a, «-- an]) with the following properties: 
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(1) the product [a,] of one element is the element itself: 
(ii) the product [a,a2] of two elements is given by the law of composition; 
(ili) for any integer i between | and n, [ay ++: an] = [ai +++ aj ][ai+1 +++ an). 


The right side of equation (iii) means that the two products [a +: a;] and [aj+1 +++ an] 
are formed first and the results are then multiplied using the given law of composi- 
tion. 


Proof. We use induction on n. The product ts defined by (i) and (ii) forn =< 2, 
and it does satisfy (111) when n = 2. Suppose that we know how to define the 
product of r elements when r = n — 1, and that this product is the unique product 
satisfying (iit). We then define the product of n elements by the rule 


[aiee* dn) = [ar-+* an-s Jan], 


where the terms on the right side are those already defined. If a product satisfying 


(ili) exists, then this formula gives the product because it is (iil) when i = n — 1. So 
if it exists, the product is unique. We must now check (iii) for? <n — 1: 


[ar-++ an] = [ay--* Qn-1 [an] (our definition) 
= (Lai +: aj l[ai+1-** @n—1])[an] (induction hypothesis) 
= [a,--- aj]([ai+1°-* Qn—1]lan]) (associative law) 
= [a,--- ai][ai+1--+ an] (induction hypothesis). 


This completes the proof. We will drop the brackets from now on and denote the 
product by a) °*+ ay. 0 


An identity for a law of composition is an element e of S having the property 
that 


CIS) ea=a and ae=a,foralla € S. 


There can be at most one identity element. For if e,e’ were two such elements, then 
since e is an identity, ee’ = e’, and since e’ is an identity, ee’ = e. Thus e = e’. 

Both of our examples, matrix multiplication and composition of functions, 
have an identity. For n X n matrices it is the identity matrix /, and for Maps(T7, T) it 
is the identity map, which carries each element of T to itself. 

Often the identity is denoted by 1 if the law of composition is written multi- 
plicatively, or by 0 if it is written additively. These elements do not need to be re- 
lated to the numbers | and 0, but they share the property of being identity elements 
for their laws of composition. 

Suppose that our law of composition has an identity, and let us use the symbol 
1 for it. An element a € S is called invertible if there is another element b such that 


ab=1 and ba=1. 
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As with matrix multiplication [Chapter | (1.17)], it follows from the associa- 
tive law that the inverse is unique if it exists. It is denoted by a" : 
aa'=a'a= 1. 
Inverses multiply in the opposite order: 
(1.6) (ab)' = b'a"'. 


The proof is the same as for matrices [Chapter 1 (1.18)]. 
Power notation may be used for an associative law of composition: 


(17) a" = aa (n = 1) 
n times 
a= 1 provided the identity exists 
a"=a'+-a' _ provided a is invertible. 


The usual rules for manipulation of powers hold: 


(1.8) (a =a mand ekal): =a". 
It isn’t advisable to introduce fraction notation 
b 
1.9 - 
(1.9) : 


unless the law of composition is commutative, for it is not clear from the. notation 
whether the fraction stands for ba’' or a'b, and these two elements may be different. 

When additive notation is used for the law of composition, the inverse is 
denoted by -a, and the power notation a” is replaced by the notation na = 
a + +++ + a, as with addition of real numbers. 


(1.10) Definition. A group is a set G together with a law of composition which is 
associative and has an identity element, and such that every element of G has an 
inverse. 


It is customary to denote the group and the set of its elements by the same symbol. 
An abelian group is a group whose law of composition is commutative. Addi- 


tive notation is often used for abelian groups. Here are some simple examples of 
abelian groups: 


(ek) Z*: the integers, with addition; 
R*: the real numbers, with addition; 
R*: the nonzero real numbers, with multiplication; 


C*, C*: the analogous groups, where the set C of complex numbers 
replaces the real numbers R. 


Here is an important property of groups: 


(1.12) Proposition. Cancellation Law: Let a,b,c be elements of a group G. If 
ab = ac, then b = c. If ba = ca, then b = c. 
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Proof. Multiply both sides of ab=ac by a' on the left: 
b=a'ab=a'ac=c.o 


Multiplication by a ' in this proof is not a trick; it is essential. If an element a is not 
invertible, the cancellation law need not hold. For instance, 0-1 = 0- 2, or 


| ||: _f1 1 
Ziel Bi: 
The two most basic examples of groups are obtained from the examples of laws 
of composition that we have considered—multiplication of matrices and composition 
of functions—by leaving out the elements which are not invertible. As we remarked 


in Chapter 1, the n X n general linear group is the group of all invertible n X n ma- 
trices. It is denoted by —————— 


Cts) GL, = {n X n matrices A with det A # 0}. 
If we want to indicate that we are working with real or complex matrices, we write 
GL,(R) or GL,(C), 


according to the case. 

In the set S = Maps(7, 7) of functions, a map f: T—>T has an inverse func- 
tion if and only if it is bijective. Such a map is also called a permutation of T. The 
set of permutations forms a group. In Example (1.3), the invertible elements are i 
and 7, and they form a group with two elements. These two elements are the permu- 
tations of the set {a, b}. 

The group of permutations of the set {1,2,...,n} of integers from | to n is 
called the symmetric group and is denoted by Sn: 


(1.14) Sn = group of permutations of {1,..., n}. 


Because there are n! permutations of a set of n elements, this group contains n! ele- 
ments. (We say that the order of the group is n!.) The symmetric group S2 consists of 
the two elements i and 7, where i denotes the identity permutation and 7 denotes the 
transposition which interchanges 1,2 as in (1.3). The group law, composition of 
functions, is described by the fact that 7 is the identity element and by the relation 
T= 77? =i. 

The structure_of S, becomes complicated very rapidly as n increases, but we 
can work out the case n = 3 fairly easily. The symmetric group $3 contains six ele- 
ments. It will be an important example for us because it is the smallest group whose 
law of composition is not commutative. To describe this group, we pick two particu- 
lar permutations x,y in terms of which we can write all others. Let us take for x the 
cyclic permutation of the indices. It is represented by matrix (4.3) from Chapter 1: 


0 1 0 
(1715) x=/0 0 1 
1 0 0 
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For y, we take the transposition which interchanges 1, 2, fixing 3: 


0 1 0 
(1.16) y=]1 0 0 
0 gO. 1 
The six permutations of {1, 2,3} are 
(1.17) {1 xe, VAY esp it et = 2, 0 meen 
where | denotes the identity permutation. This can be verified by computing the 
products. 
The rules 
(1.18) xo = 1,-y? = 1, yx = xy 


can also be verified directly. They suffice for computation in the group S;. Any 
product of the elements x,y and of their inverses, such as x 'v*x’y for instance, can 
be brought into the form x‘y/ with 0 < i < 2and0 < j < | by applying the above 
rules repeatedly. To do so. we move all occurrences of ¥ to the right side using the 
last relation and bring the exponents into the indicated ranges using the first two 
relations: 


6 


x 'yix?y = xtyx?y = x(yx)ry = xixtyry = oe = x°y? = 1. 


Therefore one can write out a complete multiplication table for S$: with the aid of 
these rules. Because of this, the rules are called defining relations for the group. a 
concept which we will study formally in Chapter6.. = 3 = 

Note that the commutative law does not hold in $;, because vx # xy. 


2, SUBGROUPS 


One reason that the general linear group and the symmetric group are so important 
is that many other groups are contained in them as subgroups. A subset H of a group 
G is called a subgroup if it has the following properties: 


(AI) (ey) Clasvire: lil @ © Jal sinel a © Jal, nen wi |] /5/ 
(b) Identity: 1 EH. 
(c) Inverses: Ifa € H, thena’! € H. 


These conditions are explained as follows: The first condition (a) tells us that the law 
of composition on the group G can be used to define a law on H, called the induced 
law of composition. The second and third conditions (b,c) say that H is a group with” 
“respect to this induced law. Notice that (2.1) mentions all parts of the definition of a 
group except for the associative law. We do not need to mention associativity. It car- 
es over automatically from G to H. i ied 


Fess 
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Every group has two obvious subgroups: the whole group and the subgroup {1} 
consisting of the identity element alone. A subgroup is said to be a proper subgroup 
if it is not one of these two. 

Here are two examples of subgroups: 


(2.2) Examples. 
(a) The set 7 of invertible upper triangular 2 x 2 matrices 


a b 
‘| (avd #70) 


is a subgroup of the general linear group GL2(R). 
(b) The set of complex numbers of absolute value 1—the set of points on the 
unit circle in the complex plane—is a subgroup of C*. 


As a further example. we will determine the subgroups of the additive group 
Z~ of integers. Let us denote the subset of Z consisting of all multiples of a given 
integer b by bZ: 


(2.3) bZ = {n € Z|n = bk for some k € Z}. 


(2.4) Proposition. For any integer 5, the subset bZ is a subgroup of Z'. More- 
over, every subgroup H of Z* is of the type H = bZ for some integer b. 


Proof. We leave the verification that bZ is a subgroup as an exercise and pro- 
ceed to show that every subgroup has this form. Let H be a subgroup of Z*. Re- 
member that the law of composition on Z~ is addition, the identity clement is 0. and 
the inverse of a is —a. So the axioms for a subgroup read 


(i) ifa € Handb € H,thena+beEdH: 
(ii) OC H; 
(iii) ifa € H, then -a € H. 


By axiom (ii), 0 € H. If 0 is the only element of H, then H = OZ, so that case is 
settled. If not, there is a positive integer in H. For let a © H be any nonzero ele- 
ment. If a is negative, then —a is positive. and axiom (iit) tells us that ~a@ is in H. 
We choose for b the smallest positive integer in H, and we claim that H = bZ. We 
first show that bZ C H, in other words, that bk © H for every integer k. If k is a 
positive integer, then bk = b + b + +> + b(k terms). This element is in H by ax- 
iom (i) and induction. So is b(-k) = —bk, by axiom (iit). Finally, axiom (11) tells us 
thao = 0-277. 

Next we show that H C bZ. that is, that every element n © H 1s an integer 
multiple of b. We use division with remainder to write n = bq + r, where q,r are 
integers and where the remainder r is in the range 0 = r < b. Then n and bq are 
both in H. and axioms (iii) and (i) show that r = n — bg is in H too. Now by our 
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choice, b is the smallest positive integer in H, whileO =r <b. Therefore r = 0, 
and n = bq € bZ, as required. 5 


The elements of the subgroup bZ can be described as the integers which are 
divisible by b. This description leads to a striking application of proposition (2.3) to 
subgroups which are generated by two integers a, b. Let us assume that a and b are 
not both zero. The set 


5) aZ + bZ ={n € Z| n = ar + bs for some integers r, s} 


is a subgroup of Z*. It is called the subgroup generated by a and b, because it is the 
smallest subgroup which contains both of these elements. Proposition (2.3) tells us 
that this subgroup has the form dZ for some integer d, so it is the set of integers 
which are divisible by d. The generator d is called the greatest common divisor of a 
and b, for reasons which are explained in the following proposition: 


(2.6) Proposition. Let a, b be integers, not both zero, and let d be the positive in- 
teger which generates the subgroup aZ + bZ. Then 


(a) d can be written in the form d = ar + bs for some integers r and s. 
(b) d divides a and b. 
(c) If an integer e divides a and J, it also divides d. 


Proof. The first assertion (a) just restates the fact that d is contained in 
aZ + bZ. Next, notice that a and 6 are in the subgroup dZ = aZ + bZ. Therefore 
d divides a and b. Finally, if e is an integer which divides a and b, then a and b are 
in eZ. This being so, any integer n = ar + bs is also in eZ. By assumption, d has 
this form, so e divides d. 3 


If two integers a, b are given, one way to find their greatest common divisor is 
to factor each of them into prime integers and then collect the common ones. Thus 
the greatest common divisor of 36 = 2-2-3-3 and 60 = 2-2-3-5.is 12 = 2-2-3. 
the integer determined by this method has the form ar + bs would not be clear at 
all. (In our example, 12 = 36-2 — 60-1.) We will discuss the applications of this 
fact to arithmetic in Chapter 11. 

We now come to an important abstract example of a subgroup, the cyclic sub- 
group generated by an arbitrary element x of a group G. We use multiplicative nota- 
tion. The cyclic subgroup H generated by x is the set of all powers of x: 


(2e2) FE hs RO 


It is a subgroup of G—the smallest subgroup which contains x. But to interpret (2.7) 
correctly, we must remember that x” is a notation for a certain element of G. It may 
happen that there are repetitions in the list. For example, if x = 1, then all elements 
in the list are equal to 1. We may distinguish two possibilities: Either the powers of 
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x are all distinct elements, or they are not. In the first case, the group H is called 
infinite cyclic. 

Suppose we have the second case, so that two powers are equal, say x” = x”, 
where n > m. Then x” = 1 [Cancellation Law (1.12)], and so there is a nonzero 
power of x which is equal to 1. 


(2.8) Lemma. The set S of integers n such that x" = | is a subgroup of Z*. 


Proof. \f x™ = 1 and x” = 1, then x”*" = x™x" = 1 too. This shows that 
m+n€ES if m,n € S. So axiom (1) for a subgroup is verified. Also, axiom (ii) 
holds because x° = 1. Finally, if x" = 1, then x” = x"x" = x° = 1. Thus 
“ee ot n © Sou 


It follows from Lemma (2.8) and Proposition (2.4) that S = mZ, where m is 
the smallest positive integer such that x” = 1. The m elements 1, x,...,x”~' are all 
different. (If x' = x’ withO =i < j <m, then x/' = 1. But j — i < m, so this 
is impossible.) Moreover, any power x” is equal to one of them: By division with re- 
mainder, we may write n = mq+pr with remainder r less than m. Then 
x” = (x”)%x" = x’. Thus H consists of the following m elements: 


(2.9) H = {1, x,...,x~'}, these powers are distinct, and x” = 1. 


Such a group Is called a cyclic group of order m. 
The order of any group G is the number of its elements. We will often denote 
the order by 


(2.10) |G| = number of elements of G. 


Of course, the order may be infinite. 

An element of a group is said to have order m (possibly infinity) if the cyclic 
subgroup it generates has order m. This means that m is the smallest positive integer 
with the property x” = 1 or, if the order is infinite, that x” # 1 for all m # 0. 


For example, the matrix E | is an element of order 6 in GL2(R), so the 


cyclic subgroup it generates has order 6. On the other hand, the matrix k Hl has 


fale 


We may also speak of the subgroup of a group G generated by a subset U. This 
is the smallest subgroup of G containing U, and it consists of all elements of G 
which can be expressed as a product of a string of elements of U and of their in- 
verses. In particular, a subset U of G is said to generate G if every element of G is 
such a product. For example, we saw in (1.17) that the set U = {x, y} generates the 
symmetric group S;. Proposition (2.18) of Chapter 1 shows that the elementary ma- 
trices generate GL,. 


infinite order, because 
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The Klein four group V is the simplest group which is not cyclic. It will appear 
in many forms. For instance, it can be realized as the group consisting of the four 
matrices 


(2.11) Ee al 


Any two elements different from the identity generate V. 
The quaternion group H is another example of a small subgroup of GL2(C) 
which is not cyclic. It consists of the eight matrices 


Q312) > (28 tiie ky), 


where 
1 0]._fi 0), [01], _ fo i 
1=| 4 = [5 s}u= [4 ob [| A 


The two elements i, j generate H, and computation leads to the formulas 
(2.13) f=1, PSP jv. 


These products determine the multiplication table of H. 


J. ISOMORPHISMS 


Let G and G’ be two groups. We want to say that they are isomorphic if all proper- 
ties of the group structure of G hold for G’ as well, and conversely. For example, let 
G be the set of real matrices of the form 


“el 


This is a subgroup of GL, (R), and the product of two such matrices is 


ieee | 


The upper right entries of the matrices add when the matrices are multiplied, the rest 
of the matrix being fixed. So when computing with such matrices, we need to keep 
track of only the upper right entry. This fact is expressed formally by saying that the 
group G is isomorphic to the additive group of real numbers. 

How to make the concept of, isomorphism precise will not be immediately 
clear, but it turns out that the right way is to relate two groups by a bijective corre- 
spondence between their elements, compatible with the laws of composition, that is, a 
correspondence 


(3.1) G<—G’' 
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having this property: If a,b © G correspond to a',b' € G’, then the product ab in 
G corresponds to the product a'b’ in G’. When this happens, all properties of the 
group structure carry over from one group to the other. 

For example, the identity elements in isomorphic groups G and G' correspond. 
To see this, say that the identity element | of G corresponds to an element €' in G’. 
Let a’ be an arbitrary element of G’, and let a be the corresponding element of G. 
By assumption, products correspond to products. Since la = a in G, it follows that 
€'a’ = a’ inG’. In this way, one shows that e€’ = 1’. Another example: The or- 
ders of corresponding elements are equal. If a corresponds to a’ in G’, then, since 
the correspondence is compatible with multiplication, a’ = 1 if and only if 
a’ =1'. 

Since two isomorphic ‘groups have the same properties, it is often convenient 
to identify them with each other when speaking informally. For example, the sym- 
metric group S, of permutations of {1,...,n} is isomorphic to the group of permuta- 
tion matrices, a subgroup of GL,(R), and we often blur the distinction between 
these two groups. 

We usually write the correspondence (3.1) asymmetrically as a function, or 
map y: G——>G’. Thus an isomorphism ¢ from G to G' is a bijective map which is 
compatible with the laws of composition. If we write out what this compatibility 
means using function notation for p, we get the condition 


(3.2) y(ab) = ¢y(a)g(b), for all a,b € G. 


The left side of this equality means to multiply @ and b in G and then apply ¢, while 
on the right the elements ¢(a) and g(b), which we denoted by a',b’ before, are 
multjplied in G’. We could also write this condition as 


(ab)' = a'b' 


Of course the choice of G as domain for this isomorphism is arbitrary. The inverse 
function g': G'——> CG would serve just as well. 

Two groups G and G’ are called isomorphic if there exists an isomorphism 
yg: G—>G' We will sometimes indicate that two groups are isomorphic by the 
symbol ~ 


(5-3) G ~G' means G is isomorphic to G'. 


For example, let C = {...,a°7,a7',1,a,a’,...} be an infinite cyclic group. 
Then the map 
g: Z*—- C 
defined by y(n) = a” is an isomorphism. Since the notation is additive in the do- 
main and multtiplicative in the range, condition (3.2) translates in this case to 
g(m + n) = p(mte(n), or 


qmtn = qq". 
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One more simple example: 

Let G = {12907 cg "and G’ =o), yy" ang be tworeyelic groups. gen- 
erated by elements x, y of the same order. Then the map which sends x' to v' is an 
isomorphism: Two cyclic groups of the same order are isomorphic. 

Recapitulating, two groups G and G' are isomorphic if there exists an isomer- 
phism y: G——> G', a bijective map compatible with the laws of composition. The 
groups isomorphic to a given group G form what is called the isomorphism class of 
G, and any two groups in an isomorphism class are isomorphic. When one speaks of 
classifying groups, what is meant is to describe the isomorphism classes. This is too 
hard to do for all groups, but we will see later that there is, for example, one iso- 
morphism class of groups of order 3 [see (6.13)], and that there are two classes of 
groups of order 4 and five classes of groups of order 12 [Chapter 6 (5.1)]. 

A confusing point about isomorphisms 1s that there exist isomorphisms from a 
group G to itself: 

0: G=— OC. 


Such an isomorphism is called an automorphism of G. The identity map is an auto- 
morphism, of course, but there are nearly always other automorphisms as well. For 
example, let G = {1, x, x*} be a cyclic group of order 3, so that x? = 1. The trans- 
position which interchanges x and x? is an automorphism of G: 


This is because x° is another element of order 3 in the group. If we call this element 
y, the cyclic subgroup {l,y,v*} generated by y is the whole group G, because 
y* = x. The automorphism compares the two realizations of G as a cyclic group. 
The most important example of automorphism is conjugation: Let b € G be a 
fixed element. Then conjugation by b is the map ¢ from G to itself defined by 


(3.4) (x) = bxb '. 


This is an automorphism because, first of all, it is compatible with multiplication in 
the group: 

g(xy) = bxyb' = bxb 'byb' = @(x)e(y), 
and, secondly, it is a bijective map since it has an inverse function, namely conjuga- 
tion by b''. If the group is abelian, then conjugation is the identity map: 
bab"' = abb”' = a. But any noncommutative group has some nontrivial conjuga- 
tions, and so it has nontrivial automorphisms. 

The element bab ' is called the conjugate of a by b and will appear often. Two 
elements a; a’ of a group G are called conjugate if a’ = bab’' for some b € G. 
The conjugate behaves in much the same way as the element a itself; for example, it 
has the same order in the group. This follows from the fact that it is the image of a 
by an automorphism. 
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The conjugate has a useful, though trivial, interpretation. Namely, if we de- 

note bab ' by a’, then 

e3) ba = a'b. 


So we can think of conjugation by b as the change in a which results when one 
moves b trom one side to the other. 


4. HOMOMORPHISMS 


Let G.G’ be groups. A homomorphism ~: G—>G' is any map satisfying the rule 
(4.1) g (ab) = e(a)g(b), 


for all a.b © G. This is the same requirement as for an isomorphism [see (3.2)]. 
The difference is that ¢ is not assumed to be bijective here. 


(4.2) Examples. The following maps are homomorphisms: 


the determinant function det: GL,(R)—— R*:; 

the sign of a permutation sign: S,—~> {+1} [see Chapter 1 (4.9)]; 

the map ¢: Z*——> G defined by y(n) = a”, where a is a fixed element of G; 
the inclusion map it: H—~>G of a subgroup H into a group G, defined by 
i(x) = x. 


— 


(a 
(b 
(Cc 
(d 


(4.3) Proposition. A group homomorphism ¢: G——->G' carries the identity to 
the identity, and inverses to inverses. In other words, g(lc) = IG, and 
y(a') = g(a)". 

Proof. Since 1= 1:1 and since g is a homomorphism, 
o(1) = g(1-1) = g(1)e(1). Cancel y(1) from both sides by (1.12): 1 = gf(1). 
Next, g(a ')e(a) = g(a'a) = g(\) = 1, and similarly g(a)p(a') = 1. Hence 
gta”) =apla) xc 


Every group homomorphism ¢ determines two important subgroups: its image 
and its kernel. The image of a homomorphism ¢: G——> G' is easy to understand. It 
is the image of the map 
(4.4) im g = {x € G'| x = ¢(a) for some a € G}, 
and it is a subgroup of G’. Another notation for the image is p(G). In Examples 
(4.2a,b), the image is equal to the range of the map, but in example (4.2c) it is the 


cyclic subgroup of G generated by a, and in Example (4.2d) it is the subgroup H. 
The kernel of y is more subtle. It is the set of elements of G which are mapped 


to the identity in G’: 
(4.5) ker p = {a € G| g(a) = I}, 
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which can also be described as the inverse image ¢ '(1) of the identity element [see 
Appendix (1.5)]. The kernel is a subgroup of G, because if a and b are in ker 9, 
then y(ab) = ¢(a)p(b) = 1-1 = 1, hence ab € ker g, and so on. 

The kernel of the determinant homomorphism is the subgroup of matrices 
whose determinant is 1. This subgroup is called the special linear group and is de- 
noted by SL,(R): 


(4.6) SL,(R) = {real n X n matrices A | det A = 1}, 


a subgroup of GL,(R). The kernel of the sign homomorphism in Example (4.2b) 
above is called the alternating group and is denoted by A,: 


(4.7) An = {even permutations}, 


a subgroup of S,. The kernel of the homomorphism (4.2d) is the set of integers n 
such that a” = 1. That this is a subgroup of Z* was proved before, in (2.8). 

In addition to being a subgroup, the kernel of a homomorphism has an extra 
property which is subtle but very important. Namely, if a is in ker g and b is any 
element of the group G, then the conjugate bab™' is in ker gy. For to say a € ker © 
means (a) = 1. Then 


p(bab"') = p(b)g(a)p(b"') = o(b)ig(b)' = 1, 
so bab™' € ker ¢ too. 


(4.8) Definition. A subgroup N of a group G is called a normal subgroup if it has 
the following property: For every a € N and every b € G, the conjugate bab"! is 
in N. 


As we have just seen, 
(4.9) - The kernel of a homomorphism is a normal subgroup. 


Thus SZ,(R) is a normal subgroup of GL,(R), and A, is a normal subgroup of S,. 
Any subgroup of an abelian group G is normal, because when G is abelian, 
bab™' = a. But subgroups need not be normal in nonabelian groups. For example, 
group 7 of invertible upper triangular matrices is not a normal subgroup of GL2(R). 
For let A = : oral 
B € GL,(R), but BAB"! & T. 
The center of a group G, sometimes denoted by Z or by Z(G), is the set of ele- 
ments which commute with every element of G: 


(4.10) Z = {z € G| zx = xz for all x € Gh. 


The center of any group is a normal subgroup of the group. For example, it can be 
shown that the center of GL,(R) is the group of scalar matrices, that is, those of the 
form c/. 


\ 1) Then BAB"! = ; , |: Here 4 € T and 
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5. EQUIVALENCE RELATIONS AND PARTITIONS 


A fundamental mathematical construction is to start with a set S and to form a new 
set by equating certain elements of S according to a given rule. For instance, we 
may divide the set of integers into two classes, the even integers and the odd in- 
tegers. Or we may wish to view congruent triangles in the plane as equivalent geo- 
metric objects. This very general procedure arises in several ways, which we will 
discuss here. 


Let S be a set. By a partition P of S, we mean a subdivision of S into nonover- 
lapping subsets: 


(5.1) S = union of disjoint, nonempty subsets. 


For example, the sets 


tL, 3}, {2, 5}, {4} 
form a partition of the set {1,2,3,4,5}. The two sets, of even integers and of odd 
integers, form a partition of the set Z of all integers. 
An equivalence relation on S is a relation which holds between certain ele- 
ments of S. We often write it as a ~ b and speak of it as equivalence of a and b. 


(5.2) An equivalence relation is required to be: 


(i) transitive: If a ~ band b ~ c, thena ~ c; 
(ii) symmetric: If a ~ b, then b ~ a; 
(ili) reflexive: a ~ a for all,a € S. 


Congruence of triangles is an example of an equivalence relation on the set S of tri- 
angles in the plane. 

Formally, a relation on S is the same thing as a subset R of the set S x S of 
pairs of elements; namely, the subset R consists of pairs (a,b) such that a ~ b. 
In terms of this subset, we can write the axioms for an equivalence relation as fol- 
lows: (i) if (a,b) € R and (b,c) © R, then (a,c) € R; (ii) if (a,b) © R, then 
(b,a) € R; and (iii) (a,a) € R for all a. 

The notions of a partition of S and an equivalence relation on S are logically 
equivalent, though in practice one is often presented with just one of the two: Given 
a partition P on S, we can define an equivalence relation R by the rule a ~ b if a 
and b lie in the same subset of the partition: Axioms (5.2) are obviously satisfied. 
Conversely, given an equivalence relation R, we can define a partition P this way: 
The subset containing a is the set of all elements b such that a ~ b. This subset is 
called the equivalence class of a, and S is partitioned into equivalence classes. 

Let us check that the equivalence classes partition the set S. Call C, the equiva- 
lence class of an element a € S. So Cz consists of the elements b such that a ~ b: 


(5.3) C, = {b ES|a ~ dh. 
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The reflexive axiom tells us that a © C,. Therefore the classes C, are nonempty, 
and since a can be any element, the classes cover S$. The remaining property of a 
partition which must be verified is that equivalence classes do not overlap. It is easy 
to become confused here, because if a ~ b then by definition b © Cy. But b © Cy 
too. Doesn't this show that C. and C; overlap? We must remember that the symbol 
C, is our notation for a subset of S defined in a certain way. The partition consists of 
the subsets, not of the notations. It is true that C,, and Cy have the element ) in com- 
mon, but that is all right because these are two notations for the same set. We will 
show the following: 


(5.4) Suppose that Cy and Cp have an element d in common. Then Cu = Cp. 


Let us first show that if a@ ~ b then Cy = Cy. To do so, let x be an arbitrary 
element of C,. Then b ~ x. Since a ~ hb, transitivity shows that a ~ x, hence that 
x € Cy. Therefore Ch C Cy. The opposite inclusion follows from interchanging the 
roles of a and b. To prove (5.4), suppose that d is in C, and in Cy: then a ~ d and 
b ~ d. Then by what has been shown, Cu = Cu = Cp, as required. 


Suppose that an equivalence relation or a partition ts given on a set S. Then we 
may construct a new set S whose elements are the equivalence classes or the subsets 
making up the partition. To simplify notation, the equivalence class of a, or the sub- 
set of the partition containing a, is often denoted by @. Thus @ is an element of S. 

Notice that there ts a natural surjective map 


S— S. which sends 
(5.5) = 

awe a. 
In our original example of the partition of § = Z, the set S contains the two ele- 
ments (Even), (Odd), where the symbol (Even) represents the set of even integers 
and (Odd) the set of odd integers. And 0 = 2 = 4 and so on. So we can denote the 
set (Even) by any one of these symbols. The map 


(5.6) Z— {(Even), (Odd)} 


is the obvious one. 

There are two ways to think of this construction. We can imagine putting the 
elements of S into separate piles, one for each subset of the partition, and then re- 
garding the piles as the elements of a new set S. The map S—— § associates each 
element with its pile. Or we can think of eaeane what we mean by equality among 
elements of S, interpreting a ~ b to mean a = b in S. With this way of looking at 
it, the elements in the two sets S and S correspond, but in § more of them are equal 
to each other. It seems to me that this is the way we treat congruent triangles in 
school. The bar notation (5.5) is well suited to this intuitive picture. We can work 
with the same symbols as in S, but with bars over them to remind us of the new rule: 


(5-7) a = bmeansa ~ b. 


This notation is often very convenient. 
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A disadvantage of the bar notation is that many symbols represent the same el- 
ement of S. Sometimes this disadvantage can be overcome by choosing once and for 
all a particular element, or a representative, in each equivalence class. For example, 
it is customary to represent (Even) by 0 and (Odd) by 1: 


(5.8) {(Even), (Odd)} = {0,1}. 


Though the pile picture is more immediate, the second way of viewing S is of- 
ten the better one, because operations on the piles are clumsy to visualize, whereas 
the bar notation is well suited to algebraic manipulation. 

Any map of sets ¢: S——~T defines an equivalence relation on the domain S, 
namely the relation given by the rule a ~ bif g(a) = ¢(b). We will refer to this as 
the equivalence relation determined by the map. The corresponding partition is 
made up of the nonempty inverse images of the elements of 7. By definition, the in- 
verse image of an element ¢ € T is the subset of S consisting of all elements s such 
that ¢(s) = t. It is denoted symbolically as 


(5.9) g(t) = {s © S| o(s) = ¢h. 


Thus ¢ '(t) is a subset of the domain S, determined by the element t € T. (This is 
symbolic notation. Please remember that ¢' is usually not a function.) The inverse 
images may also be called the fibres of the map gy. The fibres p7'(t) which are 
nonempty, which means ¢ is in the image of ¢, form a partition of S. Here the set S 
of equivalence classes, which is the set of nonempty fibres, has another incarnation, 
as the image im ¢ of the map. Namely, there is a bijective map 


(5.10) @: S— im ¢, 


the map which sends an element 5 of S to ¢(s). 

We now go back to group homomorphisms. Let ¢: G——>G' be a homomor- 
phism, and let us analyze the equivalence relation on G which is associated to the 
map ¢ or, equivalently, the fibres of the homomorphism. This relation is usually de- 
noted by =, rather than by ~, and is referred to as congruence: 


(5.11) a =b if g(a) = o(d). 
For example, let g: CX —— R” be the absolute value homomorphism defined 
by g(a) = |a|. The induced equivalence relation is a = b if |a| = |b|. The fibres 


of this map are the concentric circles about 0. They are in bijective correspondence 
with elements of im g, the set of positive reals. 


(5.12) Figure. Fibres of the absolute value map CX —> R”. 
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The relation (5.11) can be rewritten in a number of ways, of which the follow- 
ing will be the most important for us: 


(5.13) Proposition. Let ¢: G——~>G’ be a group homomorphism with kernel N, 
and let a, b be elements of G. Then g(a) = ¢(b) if and only if b = an for some ele- 
ment n € N, or equivalently, if a'b € N. 


Proof. Suppose that y(a) = g(b). Then y(a)"'¢(b) = 1, and since ¢ is a ho- 
momorphism we can use (4.1) and (4.3) to rewrite this equality as p(a"'b) = 1. 
Now by definition, the kernel N is the set of all elements x © G such that g(x) = 1. 
Thus a''b EN, or a'b = n for some n € N. Hence b = an, as required. Con- 
versely, if b = an andn € N, then g(b) = g(a)p(n) = ¢(a)l = g(a). o 


The set of elements of the form an is denoted by aN and is called a coset of N 
in G: 
(5.14) aN = {g € G| g = an for some n E N}. 
So the coset aN is the set of all group elements b which are congruent to a. 
The congruence relation a = b partitions the group G into congruence classes, the 


cosets aN. They are the fibres of the map ¢. In particular, the circles about the 
origin depicted in (5.12) are cosets of the absolute value homomorphism. 


(5.15) Figure. A schematic diagram of a group homomorphism. 


An important case to look at is when the kernel is the trivial subgroup. In that 
case (5.13) reads as follows: 


(5.16) Corollary. A group homomorphism ¢: G—— G’ is injective if and only if 
its kernel is the trivial subgroup {1}. o 


This gives us a way to verify that a homomorphism is an isomorphism. To do so, we 
check that ker p = {1}, so that ¢ is injective, and also that im g = G’, that is, that 
¢ is surjective. 


Section 6 Cosets 57 
6. COSETS 


One can define cosets for any subgroup H of a group G, not only for the kernel of a 
homomorphism. A /eft coset is a subset of the form 


(6.1) aH = {ah|h € H}. 


Note that the subgroup H is itself a coset, because H = 1H. 
The cosets are equivalence classes for the congruence relation 


(6.2) a = bif b = ah, forsomeh E€ H. 


Let us verify that congruence is an equivalence relation. Transitivity: Suppose that 
a = band b = c. This means that b = ah andc = bh’ for some h,h' € H. There- 
fore c = ahh’. Since H is a subgroup, hh’ € H. Thus a = c. Symmetry: Suppose 
a = b, so that b = ah. Then a = bh'' and h'' € H, and so b = a. Reflexivity: 
a = al and 1 € H, soa =a. Note that we have made use of all the defining prop- 
erties of a subgroup. 

Since equivalence classes form a partition, we find the following: 


(6.3) Corollary. The left cosets of a subgroup partition the group. o 


(6.4) Note. The notation aH defines a certain subset of G. As with any equiva- 
lence relation, different notations may represent the same subset. In fact, we know 
that aH is the unique coset containing a, and so 


(6.5) aH = DH if and only if a = b. 
The corollary just restates (5.4): 
(6.6) If aH and bH have an element in common, then they are equal. 


For example, let G be the symmetric group $3, with the presentation given in 
(1.18): G = {1, x, x*, y, xy, x’y}. The element xy has order 2, and so it generates a 
cyclic subgroup H = {1, xy} of order 2. The left cosets of H in G are the three sets 


(6.7) {1,xy} = H = xyH, {x,x’*y} = xH = x’yH, {x’y} = x?H = yH. 
Notice that they do partition the group. 

The number of left cosets of a subgroup is called the index of H in G and is 
denoted by 


(6.8) (G2: |) 


Thus in our example the index is 3. Of course if G contains infinitely many ele- 


ments, the index may be infinite too. 
Note that there is a bijective map from the subgroup H to the coset aH, send- 


ing haw ah. (Why is this a bijective map?) Thus 
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(6.9) Each coset aH has the same number of elements as H does. 


Since G is the union of the cosets of H and since these cosets do not overlap, 
we obtain the important Counting Formula 


(6.10) |G| = |HI|[G: A], 


where |G| denotes the order of the group, as in (2.10), and where the equality has 
the obvious meaning if some terms are infinite. In our example (6.7), this formula 
reads 6 = 2 - 3. 

The fact that the two terms on the right side of equation (6.10) must divide the 
left side is very important. Here is one of these conclusions, stated formally: 


(6.11) Corollary. Lagrange’s Theorem: Let G be a finite group, and let H be a 
subgroup of G. The order of H divides the order of G. o 


In Section 2 we defined the order of an element a € G to be the order of the 
cyclic subgroup generated by a. Hence Lagrange’s Theorem implies the following: 


(6.12) The order of an element divides the order of the group. 
This fact has a remarkable consequence: 


(6.13) Corollary. Suppose that a group G has p elemeuts and that p is a prime in- 
teger. Let a © G be any element, not the identity. Then G is the cyclic group 
{1,a,...,a?~'} generated by a. 


For, since a # 1, the order of a is greater than 1, and it divides |G| = p. Hence it 
is equal to p. Since G has order p, {1,a,...,a?~'} is the whole group. 5 


Thus we have classified all groups of prime order p. They form one isomor- 
phism class, the class of a cyclic group of order p. 

The Counting Formula can also be applied when a homomorphism is given. 
Let ¢: G—~G' be a homomorphism. As we saw in (5.13), the left cosets of ker ¢ 
are the fibres of the map ¢g. They are in bijective correspondence with the elements 
in the image. 
(6.14) [G : ker g] = |im o|. 


Thus (6.10) implies the following: 


(6.15) Corollary. Let ¢: G—->G' be a homomorphism of finite groups. Then 
|G| = |ker 9| - |im 9}. 
Thus |ker ¢| divides |G], and |im ¢| divides both |G| and |G’ |. 


Proof. The formula is obtained by combining (6.10) and (6.14), and it implies 
that | ker g| and |im ¢| divide |G|. Since im ¢ is a subgroup of G’, |im ¢| divides 
|G'| as well. o 
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Let us go back for a moment to the definition of cosets. We made the decision 
to work with left cosets aH. One can also define right cosets of a subgroup H 


and repeat the above discussion for them. The right cosets of a subgroup H are the 
sets 


(6.16) Ha = {ha|h € H}, 
which are equivalence classes for the relation (right congruence) 
a = bif b = ha, for someh € H. 


Right cosets need not be the same as left cosets. For instance, the right cosets of the 
subgroup {1, xy} of 53 are 


(6.17) {l,xy} = H = Hxy, {x,y}= Hx = Hy, {x?,x*y} = Hx? = Hrx’y. 


This partition of S3 is not the same as the partition (6.7) into left cosets. 
However, if N is a normal subgroup, then right and left cosets agree. 


(6.18) Proposition. A subgroup H of a group G is normal if and only if every left 
coset is also a right coset. If H is normal, then aH = Ha for every a € G. 


Proof. Suppose that H is normal. For any h € H and anya E€ G, 
ah = (aha™')a. 


Since H is a normal subgroup, the conjugate element k = aha™' is in H. Thus the el- 
ement ah = ka is in aH and also in Ha. This shows that aH C Ha. Similarly, 
aH > Ha, and so these two cosets are equal. Conversely, suppose that H is not nor- 
mal. Then there are elements h € H anda € G so that aha‘! is not in H. Then ah 
is in the left coset aH but not in the right coset Ha. If it were, say ah = h'a for 
some h' € H, then we would have aha ' = h' € H, contrary to our hypothesis. 
On the other hand, aH and Ha do have an element in common, namely the element 
a. So aH can’t be in some other right coset. This shows that the partition into left 
cosets is not the same as the partition into right cosets. o 


7, RESTRICTION OF A HOMOMORPHISM TO A SUBGROUP 


The usual way to get an understanding of a complicated group is to study some less 
complicated subgroups. If it made sense to single out one method in group theory as 
the most important, this would be it. For example, the general linear group GL; is 
much more complicated than the group of invertible upper triangular matrices. We 
expect to answer any question about upper triangular matrices which comes up. And 
by taking products of upper and Jower triangular matrices, we can cover most of the 
group GL,. Of course, the trick is to get back information about a group from an un- 
derstanding of its subgroups. We don’t have general rules about how this should be 
done. But whenever a new construction with groups is made, we should study its ef- 
fect on subgroups. This is what is meant by restriction to a subgroup. We will do 
this for subgroups and homomorphisms in this section. 
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Let H be a subgroup of a group G. Let us first consider the case that a second 
subgroup K is given. The restriction of K to H is the intersection K M H. The fol- 
lowing proposition is a simple exercise. 


(7.1) Proposition. The intersection K  H of two subgroups is a subgroup of H. 
If K is a normal subgroup of G, then K ™ H is a normal subgroup of H.o 


There is not very much more to be said here, but if G is a finite group, we may be 
able to apply the Counting Formula (6.10), especially Lagrange’s Theorem, to get 
information about the intersection. Namely, K M H is a subgroup of H and also a 
subgroup of K. So its order divides both of the orders | H| and | K |. If | H | and |K | 
have no common factor, we can conclude that K N H = {1}. 

Now suppose that a homomorphism ¢: G——>G' is given and that H is a sub- 
group of G as before. Then we may restrict g to H, obtaining a homomorphism 


(7.2) oly: H—>G'. 


This means that we take the same map ¢ but restrict its domain to H. In other 
words, ¢|,,(2) = g(h) for all h © H. The restriction is a homomorphism because 
is one. 

The kernel of ¢],, is the intersection of ker g with H : 


(7.3) : ker g|,, = (ker g) N H. 


This is clear from the definition of kernel: g(h) = 1 if and only if h € ker ¢. 

Again, the Counting Formula may help to describe this restriction. For, the 
image of ¢|,, is p(H). According to Corollary (6.15), | @(H)| divides both | H| and 
|G’|. So if |H | and |G’| have no common factor, p(H) = {1}. Then we can ton- 
clude that H C ker @. 

For example, the sign of a permutation is described by a homomorphism 
(4.2b). S,— {+1}. The range of this homomorphism has order 2, and its kernel is 
the alternating group. If a subgroup H of S, has odd order, then the restriction of 
this homomorphism to H is trivial, which means that H is contained in the alternat- 
ing group, that is, H consists of even permutations. This will be so when H is the 
cyclic subgroup generated by a permutation p whose order in the group is odd. It fol- 
lows that every permutation of odd order is an even permutation. On the other hand, 
we can not make any conclusion about permutations of even order. They may be odd 
or even. 

When a homomorphism ¢: G——>G’ and a subgroup H' of G’ are given, we 
may also restrict g to H'. Here we must cut down the domain G of ¢ suitably, in 
order to get a map to H’. The natural thing to do is to cut down the domain as little 
as possible by taking the entire inverse image of H’: 


(7.4) Proposition. Let ¢: G——>G' be a homomorphism, and let H’ be a sub- 
group of G’. Denote the inverse image ¢ '(H') = {x € G| (x) € H'} by A. 
Then 
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(a) A is a subgroup of G. 
(b) If H’ is a normal subgroup of G’, then H is a normal subgroup of G. 
(c) A contains ker Q. 


(d) The restriction of g to Al defines a homomorphism H—=— H', whose kernel is 
ker ¢. 


For example, consider the determinant homomorphism det: GL,(R)-—> R*. 
The set P of positive real numbers is a subgroup of R™, and its inverse image is the 
set of invertible n X n matrices with positive determinant, which is a normal sub- 
group of GL,(R). 


Proof of Proposition (7.4). This proof is also a simple exercise, but we must 
keep in mind that gy”! is not a map. By definition, H is the set of elements x € G 
such that ¢(x) © H'’. We verify the conditions for a subgroup. Identity: 1 © H be- 
cause y(1) = 1 © H’. Closure: Suppose that x, y © H. This means that g(x) and 
y(y) are in H’. Since H' is a subgroup, ¢(x)p(y) € H’'. Since ¢ is a homomor- 
phism, y(x)y(y) = ¢(xy) € H’. Therefore xy € H. Inverses: Suppose x € H, so 
that p(x) € H'; then y(x)"' © H’ because H’ is a subgroup. Since ¢ is a homo- 
morphism, g(x)! = g(x"'). Thus x"! € H. e 

Suppose that H' is a normal subgroup, and let x € H and g © G. Then 
oy (gxg') = p(g)e(x)o(g)', and p(x) € H’. Therefore y(gxg"') © H’, and this 
shows that gxg’' © H. Next, H contains ker ¢ because if x € ker gy then g(x) = 1, 
and 1 € H’. Sox € ¢'(H’). The last assertion should be clear. o 


8. PRODUCTS OF GROUPS 


Let G,G’ be two groups. The product set G x G’ can be made into a group by com- 
ponent-wise multiplication. That is, we define multiplication of pairs by the rule 


(8.1) (a,a'), (b,b') ~~» (ab,a'b'), 
for a,b €&G and a',b'€G'. The pair (1,1) is am identity, and 
(a,a’)' = (a',a'"'). The associative law in G X G' follows from the fact that it 


holds in G and in G’. The group thus obtained is called the product of G and G' and 
is denoted by G X G’. Its order is the product of the orders of G and G’. 

The product group is related to the two factors G,G' in a simple way, which 
we can sum up in terms of some homomorphisms 
G G 

GXG' ; 
ve ~ 

oe G' 


(8.2) 
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defined by 
iy, 1), i’) Sale), 


pix; x’) = roe pO) =e 


The maps i,i' are injective and may be used to identify G,G’ with the subgroups 
GX1, 1XG' of GXG'. The maps p, p’ are surjective, ker p = 1 XG’, and 
ker p’ = GX 1. These maps are called the projections. Being kernels, G X 1 and 
1 XG’ are normal subgroups of G x G’. 


(8.3) Proposition. The mapping property of products: Let H be any group. The 
homomorphisms ®: H——>G XG’ are in bijective correspondence with pairs 
(¢~, ¢’) of homomorphisms 


o: HG, go’: H-——>G'. 
The kernel of ® is the intersection (ker g) N (ker ¢’). 


Proof. Given a pair (g, ¢') of homomorphisms, we define the corresponding 
homomorphism 


®: H—>G XG' 


by the rule ®(h) = (yh), y'(h)). This is easily seen to be a homomorphism. Con- 
versely, given ®, we obtain g and ~’ by composition with the projections, as 


= p®, g’ = p’®. 


Obviously, P(h) = (1, 1) if and only if g(h) = 1 and y’(h) = 1, which shows that 
ker ® = (ker gy) M (ker g'). a 


It is clearly desirable to compose a given group G as a product, meaning to find 
two groups H and H’ such that G is isomorphic to the product H x H'’. For the 
groups H,H’ will be smaller and therefore simpler, and the relation between 
H X H' and its factors is easily understood. Unfortunately, it is quite rare that a 
given group is a product, but it does happen occasionally. 

For example, it is rather surprising that a cyclic group of order 6 can be de- 
composed: A cyclic group C. of order 6 is isomorphic to the product C2 X C; of 
cyclic groups of orders 2 and 3. This can be shown using the mapping property just 
discussed. Say that C. = {1, x, x’,...,x°}, Co = {1, y}, Cs = {1,z,z7}. The rule 


Pilie= > C2 7G, 


defined by p(x‘) = (y',z') is a homomorphism, and its kernel is the set of elements 
x' such that y' = | and z' = 1. Now y' = 1 if and only if i is divisible by 2, while 
z' = 1 if and only if i is divisible by 3. There is no integer between 1 and 5 which is 
divisible by both 2 and 3. Therefore ker g = {1}, and ¢ is injective. Since both 
groups have order 6, ¢ is bijective and hence is an isomorphism. o 
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The same argument works for a cyclic group of order rs, whenever the two in- 
tegers r and s have no common factor. 


(8.4) Proposition. Let r,s be integers with no common factor. A cyclic group of 
order rs is isomorphic to the product of a cyclic group of order r and a cyclic group 
of order s. o 


On the other hand, a cyclic group of order 4 is not isomorphic to a product of two 
cyclic groups of order 2. For it is easily seen that every element of C2 X C has order 
1 or 2, whereas a cyclic group of order 4 contains two elements of order 4. And, the 
proposition makes no assertions about a group which is not cyclic. 

Let A and B be subsets of a group G. Then we denote the set of products of 
elements of A and B by 


(8.5) AB = {x € G| x = ab for some a € A and b € B}. 


The next proposition characterizes product groups. 
(8.6) Proposition. Let H and K be subgroups of a group G. 


(a) If H M K = {I}, the product map p: H x K——> G defined by p(h, k) = hk is 
injective. Its image is the subset HK. 

(b) If either H or K is a normal subgroup of G, then the product sets HK and KH 
are equal, and HK is a subgroup of G. 

(c) If H and K are normal, H  K = {i}, and HK = G, then G is isomorphic to 
the product group H x K. 


Proof. (a) Let (hi,ki), (ho, k2) be elements of H X K such that hiky = hok. 
Multiplying both sides of this equation on the left by h,"' and on the right by k2"', 
we find Kik> = es Since H 1 K = een kiko = 1 Ae = | 4 hence h, = hy 
and k, = kz. This shows that p is injective. 


(b) Suppose that H is a normal subgroup of G, and leth € Handk € K. Note that 
kh = (khk"')k. Since H is normal, khk"' © H. Therefore kh © HK, which shows 
that KH C HK. The proof of the other inclusion is similar. The fact that HK is a 
subgroup now follows easily. For closure under multiplication, note that in a product 
(hk)(h'k') = h(kh')k', the middle term kh' is in KH = HK, say kh' = h’k". Then 
hkh'k' = (hh")(k'k’) © HK. Closure under inverses is similar: (hk)' = k'h! € 
KH = HK. And of course, 1 = 1:1 © HK. Thus HK is a subgroup. The proof is 
similar in the case that K is normal. 


(c) Assume that both subgroups are normal and that H M K = {1}. Consider the 
product (hkh"')k"'! = h(kh"'k"'). Since K is a normal subgroup, the left side is in K. 
Since H is normal, the right side is in H. Thus this product is the intersection 
H 1K, i.e., hkh'k"' = 1. Therefore hk = kh. This being known, the fact that 
p is a homomorphism follows directly: In the group H x K, the product rule is 
(hy, ki)(h2, k2) = (Ah, kikz), and this element corresponds to hyh2k,k, in G, while 
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in G the products h,k, and h2k> multiply as hk, h2k2. Since hyk; = ki hz, the products 
are equal. Part (a) shows that p is injective, and the assumption that HK = G shows 
that p is surjective. o 


It is important to note that the product map p: H xX K——>G will not be a 
group homomorphism unless the two subgroups commute with each other. 


9. MODULAR ARITHMETIC 


In this section we discuss Gauss’s definition of congruence of integers, which is one 
of the most important concepts in number theory. We work with a fixed, but arbi- 
trary, positive integer n throughout this section. 

Two integers a,b are said to be congruent modulo n, written 


(9.1) a = b (modulo n), 


if n divides b — a, or if b = a + nk for some integer k. It is easy to check that this 
is an equivalence relation. So we may consider the equivalence classes, called con- 
gruence classes modulo n or residue classes modulo n, defined by this relation, as in 
Section 5. Let us denote the congruence class of an integer a by the symbol a. It is 
the set of integers 


(9.2) @ =A... a@— 2n ao — 7, 0a. 4 nea + 2a 


If a and b are integers, the equation @ = b means that n divides b — a. 
The congruence class of 0 is the subgroup 


O=unZ2 =1 on, Vee.) 


of the additive group Z~ consisting of all multiples of n. The other congruence 
classes are the cosets of this subgroup. Unfortunately, we have a slight notational 
problem here, because the notation nZ is like the one we use for a coset. But nZ 
is not a coset; it is a subgroup of Z*. The notation for a coset of a subgroup H 
analogous to (6.1), but using additive notation for the law of composition, is 


a+H={at+h|h € H}. 


In order to avoid writing a coset as a + nZ, let us denote the subgroup nZ by H. 
Then the cosets of H are the sets 


(9.3) a+ H={a+nk|k €Z. 


They are the congruence classes @ = a + H. 


The n integers 0,1,...,n — 1 form a natural set of representative elements for 
the congruence classes: 


(9.4) Proposition. There are n congruence classes modulo n, namely 
Okla h = |. 
Or, the index [Z : nZ] of the subgroup nZ in Z is n. 
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Proof. Let a be an arbitrary integer. Then we may use division with remain- 
der to write 


a=nqtr, 


where q,r are integers and where the remainder r is in the range 0 < r <n. Thena 
is congruent to the remainder: a = r (modulo n). Thus @ = F. This shows that @ is 
one of the congruence classes listed in the proposition. On the other hand, if a and b 
are distinct integers less than n, say a = b, then b — a is less than n and different 
from zero, so n does not divide b — a. Thus a # b (modulo n), which means that 
a # b. Therefore the n classes 0, 1,...,2 — 1 are distinct. o 


The rnain point about congruence classes is that addition and multiplication of 
integers preserve congruences modulo n, and therefore these laws can be used to 
define addition and multiplication of congruence classes. This is expressed by saying 
that the set of congruence classes forms a ring. We wil. study rings in Chapter 10. 

Let @ and b be congruence classes represented by integers a and b. Their sum 
is defined to be the congruence class of a + b, and their product is defined to be the 
class of ab. In other words, we define 


(9.5) @+b=at+b and Gb = ab. 


This definition needs some justification, because the same congruence class a can be 
represented by many different integers. Any integer a’ congruent to a modulo n 
represents the same class. So it had better be true that if a’ = a and b’ = 5, then 
a’ + b' =a + banda'b’ = ab. Fortunately, this is so. 


(9.6) Lemma. If a’ = aandb’ = b(modulon), thena’ + b’ =a + b(modulo 
n) and a'b' = ab (modulo n). 


Proof. Assume that a'=a and b'=b, so that a’ =a+tnr and 
b' = b + ns for some integers r,s. Then a’ + b’ =a+b+n(r +s), which 
shows that a’+b'=a+tb._ Similarly, a’b’ = (a+ nr)(b+ ns) = 
ab + n(as + rb + nrs), which shows that a’b’ = ab, as required. o 


The associative, commutative, and distributive laws hold for the laws of com- 
position (9.5) because they hold for addition and multiplication of integers. For ex- 
ample, the formal verification of the distributive law is as follows: 


a(b + ¢) = ab +c) =alb +c) (definition of + and X for congruence classes) 


ab + ac (distributive law in the integers) 
=ab+ac=ab+act (definition of + and X for congruence classes). 
The set of congruence classes modulo n is usually denoted by 
(9.7) Z/nZ. 


Computation of addition, subtraction, and multiplication in Z/nZ can be made ex- 
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plicitly by working with integers and taking remainders on division by n. That is 
what the formulas (9.5) mean. They tell us that the map 


(9.8) Z— Z/nZ 


sending an integer a to its congruence class @ is compatible with addition and multi- 
plication. Therefore computations can be made in the integers and then carried over 
to Z/nZ at the end. However, doing this is not efficient, because computations are 
simpler if the numbers are kept small. We can keep them small by computing the re- 
mainder after some part of a computation has been made. 

Thus if n = 13, so that 


then 


can be computed as 7 + 9 = 3,11 +6 =4,3-4 = 12. 
The bars over the numbers are a nuisance, so they are often left off. One just 
has to remember the following rule: 


(9.9) To say a = b in Z/nZ means a = b (modulo n). 


10. QUOTIENT GROUPS 


We saw in the last section that the congruence classes of integers modulo n are the 
cosets of the subgroup nZ of Z*. So addition of congruence classes gives us a law 
of composition on the set of these cosets. In this section we will show that a law of 
composition can be defined on the cosets of a normal subgroup N of any group G. 
We will show how to make the set of cosets into a group, called a quotient group. 

Addition of angles is a familiar example of the quotient ccnstruction. Every 
real number represents an angle, and two real numbers represent the same angle if 
they differ by an integer multiple of 27. This is very familiar. The point of the ex- 
ample is that addition of angles is defined in terms of addition of real numbers. The 
group of angles is a quotient group, in which G = R® and N is the subgroup of in- 
teger multiples of 277. 

We recall a notation introduced in Section 8: If A and B are subsets of a group 
G, then 


AB = {ab|a € A, b € B}. 
We will call this the product of the two subsets of the group, though in other con- 
texts the term product may stand for the set A X B. 
(10.1) Lemma. Let N be a normal subgroup of a group G. Then the product of 
two cosets aN, bN is again a coset, in fact 


(aN)(bN) = abN. 
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Proof. Note that Nb = bN, by (6.18), and since N is a subgroup NN = N. 
The following formal manipulation proves the lemma: 


(aN)(bN) = a(Nb)N = a(bN)N = abNN = abN. o 


This lerama allows us to define multiplication of two cosets C,,C2 by this rule: 
CC, is the product set. To compute the product coset, take any elements a € C, 
and b € C;, so that C; = aN and C, = bN. Then C,C2 = abN is the coset contain- 
ing ab. This is the way addition of congruence classes was defined in the last section. 

For example, consider the cosets of the unit circle N in G = C*. As we saw in 
Section 5, its cosets are the concentric circles 

C, = {z | zen 
Formula (10.1) amounts to the assertion that if |a| =r and |B| =, then 
lap| = rs: 
C+Cs = Crs. 

The assumption that N is a normal subgroup of G is crucial to (10.1). If H 

is not a normal subgroup of G, then there will be left cosets C:, C2 of H in G whose 


products do not lie in a single left coset. For to say H is not normal means there are 
elements h € H and a € G so that aha"' € H. Then the set 


(10.2) (aH)(a"'H) 


does not lie in any left coset. It contains ala’'1 = 1, which is an element of H. So 
if the set (10.2) is contained in a coset, that coset must be H = 1H. But it also con- 
tains aha™'1, which is not in H. 5 


It is customary to denote the set of cosets of a normal subgroup N of G by the 
symbol 
(10.3) G/N = set of cosets of N in G. 


This agrees with the notation Z/nZ introduced in Section 9. Another notation we 
will frequently use for the set of cosets is the bar notation: 


G/N =G .and aN =@, 
so that @ denotes the coset containing a. This is natural when we want to consider 
the map 
(10.4) aw: G—>G=G/N sending ana = aN. 
(10.5) Theorem. With the law of composition defined above, G = G/N is a 


group, and the map 7r (10.4) is a homomorphism whose kernel is N. 
The order of G/N is the index [G : N] of N in G. 


(10.6) Corollary. Every norma] subgroup of a group G is the kernel of a homo- 
morphism. 5 
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This corollary allows us to apply everything that we know avout homomorphisms to 
improve our understanding of normal subgroups. 


Proof of Theorem (10.5). First note that a is compatible with the laws of composi- 
tion: Since multiplication of cosets is defined by multiplication of elements, 
a (a) (b) = 7 (ab). Moreover, the elements of G having the same image as the 
identity element | are those in N: | = 1N = N. The group axioms in G follow from 
Lemma (10.7): 


(10.7) Lemma. Let G be a group, and let S be any set with a law of composition. 
Let ¢: G——> S be a surjective map which has the property ¢(a)p(b) = g(ab) for 
all a,b in G. Then S is a group. 


Proof. Actually, any law concerning multiplication which holds in G will be 
carried over to S. The proof of the associative law is this: Let s,, 52,53 € S. Since @ 
is surjective, we know that s; = y(a;) for some a; € G. Then 


(s152)53 = (p(ai)p (a2))p (az) = —(aiar)p (az) = ¢(aiaza3) 
= ~ (a1) (a243) = v (ai)(y (az) ¢@ (as)) = §)(S253). 


We leave the other group axioms as an exercise. o 


(10.8) Figure. A schematic diagram of coset multiplication. 


For example, let G = R®* be the multiplicative group of nonzero real numbers, 
and let P be the subgroup of positive real numbers. There are two cosets, namely P 
and —-P = {negative reals}, and G = G/P is the group of two elements. The multi- 
plication rule is the familiar rule: (Neg)(Neg) = (Pos), and so on. 

The quotient group construction is related to a general homomorphism 
y: G——>G’ of groups as follows: 


_A10.9) Theorem. _First Isomorphism Theorem: Let ¢: G——>G' be a surjective 
group homomorphism, and lef = ker ¢~. Then G/N is isomorphic to G’ by the 
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map @ which sends the coset @ = aN to ¢(a): 


Gla) = pla). 


This is our fundamental method of identifying quotient groups. For example, the ab- 
solute value map C*——>R* maps the nonzero complex numbers to the positive 
real numbers, and its kernel is the unit circle U. So the quotient group C*/U is iso- 
morphic to the multiplicative group of positive real numbers. Or, the determinant is 
a surjective homomorphism GL,(R)——> R*, whose kernel is the special linear 
group SL,(R). So the quotient GL,,(IR)/SL,(R) is isomorphic to R™. 


Proof of the First lsomorphism Theorem. According to Proposition (5.13), the 
nonempty fibres of ¢ are the cosets aN. So we can think of G in either way, as the 
set of cosets or as the set of nonempty fibres of ¢. Therefore the map we are looking 
for is the one defined in (5.10) for any map of sets. It maps G bijectively onto the 
image of ¢, which is equal to G’ because ¢ is surjective. By construction it is com- 
patible with multiplication: G(ab) = y(ab) = y(a)e(b) = G(a)glb). a 


Es giebt alfo febr viel verfchiedene Arten von Gropen. 

welche fic) nicht wobl hergeblen lagen; 

und daber entftehen die verfchiedene Theile der Mathematic, 

deren eine jegliche mit einer befondern Art von Grogen befchaftiger ift. 


Leonhard Euler 
EXERCISES 


I. The Definition of a Group 


1. (a) Verify (1.17) and (1.18) by explicit computation. 
(b) Make a multiplication table for S3. 
2. (a) Prove that GL, (IR) is a group. 
(b) Prove that S, is a group. 
3. Let S be a set with an associative law of composition and with an identity element. 
Prove that the subset of S consisting of invertible elements is a group. 
4. Solve for y, given that xyz 'w = 1 in a group. 
5. Assume that the equation xyz = | holds in a group G. Does it follow that yzx = |? That 
yxz = 1? 
6. Write out all ways in which one can form a product of four elements a,b.c.d in the 
given order. 
7. Let S be any set. Prove that the law of composition defined by ab = a is associative. 
8. Give an example of 2 x 2 matrices such that A"'B # BA. 
9. Show that if ab = a in a group, then b = 1, and if ab = 1, thenb = a"'. 
10. Let a, b be elements of a group G. Show that the equation ax = b has a unique solution 
in G. 
11. Let G be a group, with multiplicative notation. We define an opposite group G° with law 
of composition a ° b as follows: The underlying set is the same as G, but the law of com- 
position is the opposite; that is, we define a ° b = ba. Prove that this defines a group 
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2. Subgroups 


16. 


v7, 
18. 


l as 
. Determine the elements of the cyclic group generated by the matrix f | explicitly. 


. Let a, b be elements of a group G. Assume that a has order 5 and that a*b = ba’. Prove 


that ab = ba. 


Which of the following are subgroups? 
(a) GL,(R) C GL,(C). 

(by {1,1} ER. 

(c) The set of positive integers in Z*. 
(d) The set of positive reals in R*. 


(e) The set of all matrices ‘ al with a # 0, in GL2(R). 


0 0 


. Prove that a nonempty subset H of a group G is a subgroup if for all x,y © H the ele- 


ment xy ' is also in H. 


. An nth root of unity is a complex number z such that z” = 1. Prove that the nth roots of 


unity form a cyclic subgroup of C™ of order n. 


. (a) Find generators and relations analogous to (2.13) for the Klein four group. 


(b) Find all subgroups of the Klein four group. 


. Let a and b be integers. 


(a) Prove that the subset aZ + bZ is a subgroup of Z*. 
(b) Prove that a and b + 7a generate the subgroup aZ + bZ. 


. Make a multiplication table for the quaternion group H. 
. Let A be the subgroup generated by two elements a,b of a group G. Prove that if 


ab = ba, then A is an abelian group. 


. (a) Assume that an element x.of a group has order rs. Find the order of x’. 


(b) Assuming that x has arbitrary order 1, what is the order of x”? 


. Prove that in any group the orders of ab and of ba are equal. 
. Describe all groups G which contain no proper subgroup. 


Prove that every subgroup of a cyclic group is cyclic. 


. Let G be a cyclic group of order n, and let r be an integer dividing n. Prove that G con- 


tains exactly one subgroup of order r. 


. (a) In the definition of subgroup, the identity element in H is required to be the identity 


of G. One might require only that H have an identity element, not that it is the same 
as the identity in G. Show that if H has an identity at all, then it is the identity in G, 
so this definition would be equivalent to the one given. 

(b) Show the analogous thing for inverses. 

(a) Let G be a cyclic group of order 6. How many of its elements generate G? 

(b) Answer the same question for cyclic groups of order 5, 8, and 10. 

(c) How many elements of a cyclic group of order n are generators for that group? 

Prove that a group in which every element except the identity has order 2 is abelian. 

According to Chapter | (2.18), the elementary matrices generate GL,(R). 

(a) Prove that the elementary matrices of the first and third types suffice to generate this 
group. 

(b) The special linear group SL,(R) is the set of real n X n matrices whose determinant 
is 1. Show that SL,(R) is a subgroup of GL,(R). 
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*(c) Use row reduction to prove that the elementary matrices of the first type generate 
SL,(R). Do the 2 X 2 case first. 
19. Determine the number of elements of order 2 in the symmetric group S.. 
20. (a) Let a,b be elements of an abelian group of orders m,n respectively. What can you 
say about the order of their product ab? 
*(b) Show by example that the product of elements of finite order in a nonabelian group 
need not have finite order. 
21. Prove that the set of elements of finite order in an abelian group is a subgroup. 
22. Prove that the greatest common divisor of a and b, as defined in the text, can be obtained 
by factoring a and 5 into primes and collecting the common factors. 


3. Isomorphisms 


1. Prove that the additive group R* of real numbers is isomorphic to the multiplicative 
group P of positive reals. 
2. Prove that the products ab and ba are conjugate elements in a group. 
3. Let a,b be elements of a group G, and let a’ = bab"'. Prove that a = a’ if and only if a 
and 6 commute. 
4. (a) Let b’ = aba"'. Prove that b'" = ab"a"'. 
(b) Prove that if aba"! = b?, then a*ba? = B®. 
5. Let ¢: G——>G’ be an isomorphism of groups. Prove that the inverse function gy"! is 
also an isomorphism. 
6. Let ¢: G——>G’' be an isomorphism of groups, let x,y € G, and let x’ = p(x) and 
y’ = gly). 
(a) Prove that the orders of x and of x’ are equal. 
(b) Prove that if xyx = yxy, thenx'y’x’ = y'x'y’. 
(c) Prove that p(x!) = x""!. 
i 
] 
that they are not conjugate when regarded as elements of SL2(R). 


oe 


Prove that the matrices j i [ i are conjugate elements in the group GL2(R) but 


8. Prove that the matrices | A j | are conjugate in GL2(R). 


9. Find an isomorphism from a group G to its opposite group G° (Section 2, exercise 12). 
10. Prove that the map Am» (a')"! is an automorphism of GL, (R). 
11. Prove that the set Aut G of automorphisms of a group G forms a group, the law of com- 
position being composition of functions. 
12. Let G be a group, and let g: G——>G be the map g(x) = x!. 
(a) Prove that ¢ is bijective. 
(b) Prove that ¢ is an automorphism if and only if G is abelian. 
13. (a) Let G be a group of order 4. Prove that every element of G has order 1, 2, or 4. 
(b) Classify groups of order 4 by considering the following two cases: 
(i) G contains an element of order 4. 
(ii) Every element of G has order < 4. 
14. Determine the group of automorphisms of the following groups. 
(a) Z*, (b) acyclic group of order 10, (c) Ss. 
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16. 


13. 


14. 
15. 


16. 


17. 


. Prove that the n X n matrices M which have the block form 
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Show that the functions f = 1/x, g = (x + 1)/x generate a group of functions, the law 
of composition being composition of functions, which is isomorphic to the symmetric 
group $3. 

Give an example of two isomorphic groups such that there is more than one isomorphism 
between them. 


Homomorphisms 


. Let G be a group, with law of composition written x # y. Let H be a group with law of 


composition uo v. What is the condition for a map ¢: G——>G’ to be a homomor- 
phism? 


. Let ¢: G——>G' be a group homomorphism. Prove that for any elements a1,..., ax of 


G, g(ay --+ ax) = plas) -+* pax). 


. Prove that the kernel and image of a homomorphism are subgroups. 
. Describe all homomorphisms gy: Z*-——> Z*, and determine which are injective, which 


are surjective, and which are isomorphisms. 


. Let G be an abelian group. Prove that the nth power map g: G——>G defined by 


g(x) = x” is a homomorphism from G to itself. 


. Let f; Rt ——> C™% be the map f(x) = e. Prove that f is a homomorphism, and deter- 


mine its kernel and image. 


. Prove that the absolute value map | |: C* —>R”* sending: a~~~|a| is a homomor- 


phism, and determine its kernel and image. 


. (a) Find all subgroups of $3, and determine which are normal. 


(b) Find all subgroups of the quaternion group, and determine which are normal. 


. (a) Prove that the composition @ ° y of two homomorphisms ¢, ~ is a homomorphism. 


(b) Describe the kernel of ¢ o w. 


. Let g: G—~G’ be a group homomorphism. Prove that g(x) = @(y) if and only if 


xy © kere. 


. Let G, H be cyclic groups, generated by elements x, y. Determine the condition on the 


orders m,n of x and y so that the map sending x'~~» y! is a group homomorphism. 


; 4 with A € GL,(R) 


and D © GL,—-(R) form a subgroup P of GL,(R), and that the map P——> GL,(R) send- 

ing M~~~A is a homomorphism. What is its kernel? 

(a) Let H be a subgroup of G, and let g © G. The conjugate subgroup gHg'' is defined 
to be the set of all conjugates ghg', where h € H. Prove that gHg"' is a subgroup of 
, 

(b) Prove that a subgroup H of a group G is normal if and only if gHg"' = H for all 
BS (G. 

Let N be a normal subgroup of G, and let g € G, n € N. Prove that g'ng € N. 

Let g and w be two homomorphisms from a group G to another group G’, and let 

H CG be the subset {x € G | g(x) = w(x)}. Prove or disprove: H is a subgroup of G. 

Let g: G——>G’ be a group homomorphism, and let x © G be an element of order r. 

What can you say about the order of g(x)? 

Prove that the center of a group is a normal subgroup. 
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18. Prove that the center of GL,(R) is the subgroup Z = {cr |c € R, c # O}. 

19. Prove that if a group contains exactly one element of order 2, then that element is in the 
center of the group. 

20. Consider the set U of real 3 x 3 matrices of the form 


(a) Prove that U is a subgroup of SL,(R). 
(b) Prove or disprove: U is normal. 
*(c) Determine the center of U. 

21. Prove by giving an explicit example that GL2(R) is not a normal subgroup of GL,(C). 
22. Let g¢: G——>G’ be a surjective homomorphism. 

(a) Assume that G is cyclic. Prove that G’ is cyclic. 

(b) Assume that G is abelian. Prove that G’ is abelian. 
23. Let ¢: G——>G’ be a surjective homomorphism, and let N be a normal subgroup of G. 

Prove that y(N) is a normal subgroup of G’. 


5. Equivalence Relations and Partitions 


1. Prove that the nonempty fibres of a map form a partition of the domain. 

2. Let S be a set of groups. Prove that the relation G ~ H if G is isomorphic to A is an 
equivalence relation on S. 

3. Determine the number of equivalence relations on a set of five elements. 

4. Is the intersection R M R’ of two equivalence relations R,R’ C S X S an equivalence re- 
lation? Is the union? 

5. Let H be a subgroup of a group G. Prove that the relation defined by the rule a ~ b if 
b''a € H is an equivalence relation on G. 

6. (a) Prove that the relation x conjugate to y in a group G is an equivalence relation on G. 
(b) Describe the elements a whose conjugacy class (= equivalence class) consists of the 

element a alone. 

7. Let R be a relation on the set R of real numbers. We may view R as a subset of the (x, y)- 
plane. Explain the geometric meaning of the reflexive and symmetric properties. 

8. With each of the following subsets R of the (x, y)-plane, determine which of the axioms 
(5.2) are satisfied and whether or not R is an equivalence relation on the set R of real 
numbers. 

(a) R = {(s,s) | s € R}. 

(b) R = empty set. 

(c) R = locus {y = 0}. 

(d) R = locus {xy + 1 = 0}. 

(e) R = locus {x?y — xy? — x + y = O}. 
(f) R = locus {x? — xy + 2x — 2y = O}. 

9. Describe the smallest equivalence relation on the set of real numbers which contains the 
line x — y = 1 in the (x, y)-plane, and sketch it. 

10. Draw the fibres of the map from the (x,z)-plane to the y-axis defined by the map y = zx. 
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Work out rules, obtained from the rules on the integers, for addition and multiplication 
on the set (5.8). 


12. Prove that the cosets (5.14) are the fibres of the map ¢. 
6. Cosets 
1. Determine the index [Z : nZ]. 


12. 


Prove directly that distinct cosets do not overlap. 


. Prove that every group whose order is a power of a prime p contains an element of order 


Pp: 

Give an example showing that left cosets and right cosets of GL2(R) in GL2(C) are not 
always equal. 

Let H.K be subgroups of a group G of orders 3,5 respectively. Prove that 
HO K = {Ih}. 


» Justify (6.15) carefully. 
. (a) Let G be an abelian group of odd order. Prove that the map ¢: G——> G defined by 


g(x) = x° is an automorphism. 
(b) Generalize the result of (a). 
Let W be the additive subgroup of R’” of solutions of a system of homogeneous linear 
equations AX = 0. Show that the solutions of an inhomogeneous system AX = B forma 
coset of W. 


. Let H be a subgroup of a group G. Prove that the number of left cosets is equal to the 


number of right cosets (a) if G is finite and (b) in general. 


. (a) Prove that every subgroup of index 2 is normal. 


(b) Give an example of a subgroup of index 3 which is not normal. 
Classify groups of order 6 by analyzing the following three cases. 
(a) G contains an element of order 6. 

(b) G contains an element of order 3 but none of order 6. 

(c) All elements of G have order | or 2. 


Let G.H be the following subgroups of GL2(R): 


e-tlo baile aber? 


An element of G can be represented by a point in the (x, y)-plane. Draw the partitions of 
the plane into left and into right cosets of H. 


7. Restriction of a Homomorphism to a Subgroup 


is 


2. 


S: 


Let G and G’ be finite groups whose orders have no common factor. Prove that the only 
homomorphism gy: G——>G' is the trivial one g(x) = | for all x. 


Give an example of a permutation of even order which is odd and an example of one 
which is even. 


(a) Let H and K be subgroups of a group G. Prove that the intersection xH \ yK of two 


cosets of H and K is either empty or else is a coset of the subgroup H 1 K. 
(b) Prove that if H and K have finite index in G then H (1 K also has finite index. 
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10. 


11. 


- Prove Proposition (7.1). 
- Let H, N be subgroups of a group G, with N normal. Prove that HN = NH and that this 


set is a subgroup. 


- Let ¢: G—>G’ be a group homomorphism with kernel K, and let H be another sub- 


group of G. Describe ¢™'(¢@(H)) in terms of H and K. 


- Prove that a group of order 30 can have at most 7 subgroups of order 5. 
*8. 


Prove the Correspondence Theorem: Let ¢: G——>G' be a surjective group homomor- 
phism with kernel N. The set of subgroups H ' of G’ is in bijective correspondence with 
the set of subgroups H of G which contain N, the correspondence being defined by the 
maps H ~~» 9(H) and g '(H')<~“H'. Moreover, normal subgroups of G correspond 
to normal subgroups of G’. 


- Let G and G' be cyclic groups of orders 12 and 6 generated by elements x,y re- 


spectively, and let ¢: G——>G’ be the map defined by y(x') = y'. Exhibit the corre- 
spondence referred to the previous problem explicitly. 


Products of Groups 


. Let G, G’ be groups. What is the order of the product group G X G'? 
. Is the symmetric group S3 a direct product of nontrivial groups? 
. Prove that a finite cyclic group of order rs is isomorphic to the product of cyclic groups 


of orders r and s if and only if r and s have no common factor. 


. In each of the following cases, determine whether or not G is isomorphic to the product 


of H and K. 

(a) G = R*, H = {+1}, K = {positive real numbers}. 

(b) G = {invertible upper triangular 2 x 2 matrices}, H = {invertible diagonal ma- 
trices}, K = {upper triangular matrices with diagonal entries 1}. 

(c) G = C% and H = {unit circle}, K = {positive reals}. 


. Prove that the product of two infinite cyclic groups is not infinite cyclic. 
. Prove that the center of the product of two groups is the product of their centers. 
. (a) Let H,K be subgroups of a group G. Show that the set of products 


HK = {hk | h © H, k © K} is a subgroup if and only if HK = KH. 
(b) Give an example of a group G and two subgroups H, K such that HK is not a sub- 


group. 


. Let G be a group containing normal subgroups of orders 3 and 5 respectively. Prove that 


G tontains an element of order 15. 


. Let G be a finite group whose order is a product of two integers: n = ab. Let H, K be 


subgroups of G of orders a and b respectively. Assume that H M K = {1}. Prove that 

HK = G. Is G isomorphic to the product group H X K? 

Let x € G have order m, and let y € G’ have order n. What is the order of (x, y) in 

GXxG'? 

Let H be a subgroup of a group G, and let g: G——> H be a homomorphism whose re- 

striction to H is the identity map: g(h) = h, if h € H. LetN = ker g. 

(a) Prove that if G is abelian then it is isomorphic to the product group H x N. 

(b) Find a bijective map G—— H x N without the assumption that G is abelian, but 
show by an example that G need not be isomorphic to the product group. 


76 


Groups Chapter 2 


9. Modular Arithmetic 


i 
2. 


Be 


4. 
a 


10 


10. 


11. 


Compute (7 + 14)(3 — 16) modulo 17. 
(a) Prove that the square a? of an integer a is congruent to 0 or 1 modulo 4. 
(b) What are the possible values of a* modulo 8? 


(a) Prove that 2 has no inverse modulo 6. 
(b) Determine all integers n such that 2 has an inverse modulo n. 


Prove that every integer a is congruent to the sum of its decimal digits modulo 9. 
Solve the.congruence 2x = 5 (a) modulo 9 and (b) modulo 6. 


. Determine the integers n for which the congruences x + y = 2, 2x — 3y = 3 (modulo 


n) have a solution. 


. Prove the associative and commutative laws for multiplication in Z/nZ. 
. Use Proposition (2.6) to prove the Chinese Remainder Theorem: Let m,n,a,b be in- 


tegers, and assume that the greatest common divisor of m and n is 1. Then there is an 
integer x such that x = a (modulo m) and x = b (modulo n). 


Quotient Groups 


. Let G be the group of invertible real upper triangular 2 X 2 matrices. Determine whether 


or not the following conditions describe normal subgroups H of G. If they do, use the 
First Isomorphism Theorem to identify the quotient group G/H. 
(alan = 1 (b)a2=0 ©)a;r—en (Mansa =| 


. Write out the proof of (10.1) in terms of elements. 
. Let P be a partition of a group G with the property that for any pair of elements A, B of 


the partition, the product set AB is contained entirely within another element C of the 
partition. Let N be the element of P which contains 1. Prove that NV is a normal subgroup 
of G and that P is the set of its cosets. 


. (a) Consider the presentation (1.17) of the symmetric group $3. Let H be the subgroup 


{1,1}. Compute the product sets (1H)(xH) and (1H)(x?H), and verify that they are 
not cosets. 

(b) Show that a cyclic group of order 6 has two generators satisfying the rules x* = 1, 
Vers ea 

(c) Repeat the computation of (a), replacing the relations (1.18) by the relations given in 
part (b). Explain. 


. Identity the quotient group R”/P. where P denotes the subgroup of positive real num- 


bers. 


. Let H = {+1, +i} be the subgroup of G = C% of fourth roots of unity. Describe the 


cosets of H in G explicitly, and prove that G/H is isomorphic to G. 


. Find all normal subgroups N of the quaternion group H, and identify the quotients H/N 
. Prove that the subset H of G = GL,(R) of matrices whose determinant is positive forms 


a normal subgroup. and describe the quotient group G/H. 


- Prove that the subset G x 1 -of the product group G X G’ is a normal subgroup isomor- 


phic to G and that (G x G')/(G X 1) is isomorphic to G’. 


Describe the quotient groups C*/P and C*/U, where U is the subgroup of complex 
numbers of absolute value | and P denotes the positive reals. 


Prove that the groups R*/Z* and R*/2mZ* are isomorphic. 
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Miscellaneous Problems 


&2 ww = 


"3: 
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*8. 
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10. 
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What is the product of all mth roots of unity in C? 
Compute the group of automorphisms of the quaternion group. 


- Prove that a group of even order contains an element of order 2. 
- Let K CH CG be subgroups of a finite group G. Prove the formula 


IG: K] =(G: H][H: K}. 

A semigroup S is a set with an associative law of composition and with an identity. But 
elements are not required to have inverses, so the cancellation law need not hold. The 
semigroup S is said to be generated by an element s if the set {1,s5,5°,...} of nonnegative 
powers of s is the whole set S. For example, the relations s* = 1 and s? = s describe two 
different semigroup structures on the set {1,5}. Define isomorphism of semigroups, and 
describe all isomorphism classes of semigroups having a generator. 


. Let S be a semigroup with finitely many elements which satisfies the Cancellation Law 


(1.12). Prove that S is a group. 

Let a = (a;,...,ax) and b = (b,,...,b«) be points in k-dimensional space R*. A path 

from a to } is a continuous function on the interval (0, 1] with values in R‘, that is, a 

function f: [0, 1] ——> R*, sending rw f(t) = (x\(t),...,xx(t)), such that f(0) = @ and 

f(1) = b. If S is a subset of R* and if a,b € S, we define a ~ b if a and bcan be joined 

by a path lying entirely in S. 

(a) Show that this is an equivalence relation on S. Be careful to check that the paths you 
construct stay within the set S. 

(b) A subset S of R* is called path connected if a ~ b for any two points a,b € S. 
Show that every subset S is partitioned into path-connected subsets with the property 
that two points in different subsets can not be connected by a path in S. 

(c) Which of the following loci in R* are path-connected? {x* + y? = I}, {xy = O}, 
{xy = I}. 

The set of n X n matrices can be identified with the space R”*”. Let G be a subgroup of 

GL,(R). Prove each of the following. 

(a) If A,B,C,D € G, and if there are paths in G from A to B and from C to D, then there 
is a path in G from AC to BD. 

(b) The set of matrices which can be joined to the identity / forms a normal subgroup of 
G (called the connected component of G). 

(a) Using the fact that SL,(R) is generated by elementary matrices of the first type (see 
exercise 18, Section 2), prove that this group is path-connected. 

(b) Show that GL,(R) is a union of two path-connected subsets, and describe them. 


Let H, K be subgroups of a group G, and let g € G. The set 
HgK = {x © G| x = hgk for some h € H,k € K} 


is called a double coset. 

(a) Prove that the double cosets partition G. 

(b) Do all double cosets have the same order? 

Let H be a subgroup of a group G. Show that the double cosets HgH are the left cosets 
gH if H is normal, but that if H is not normal then there ts a double coset which properly, 
contains a left coset. 

Prove that the double cosets in GL,,(R) of the subgroups H = {lower triangular matrices} 
and K = {upper triangular matrices} are the sets HPK, where P is a permutation matrix. 
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Vector Spaces 


Immer mit den einfachsten Beispielen anfangen. 


David Hilbert 


I, REAL VECTOR SPACES 


The basic models for vector spaces are the spaces of n-dimensional row or column 
vectors: 


IR”: the set of row vectors v = (ai,...,dn), OF 
ay 
the set of column vectors v = | - 
an 


Though row vectors take less space to write, the definition of matrix multiplication 
makes column vectors more convenient for us. So we will work with column vec- 
tors most of the time. To save space, we will occasionally write a column vector in 
the form (a1,..., dn)". 

For the present we will study only two operations: 


a bd, a+b, 
(1.1) vector addition: : + = , and 
An bn Gntby, 
ay ca, 
scalar multiplication: C =|. 
An Can 
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These operations make R” into a vector space. Before going to the formal definition 
of a vector space, let us look at some other examples—nonempty subsets of R” 
closed under the operations (1.1). Such a subset is called a subspace. 


(1.2) Example. The subspaces W of the space R’ are of three types: 


(i) the zero vector alone: W = {0}; 
(ii) the vectors lying on a line L through the origin; 
(iii) the whole space: W = R?. 


a 


me 

OY 

This can be seen from the parallelogram law for addition of vectors. If W contains 
two vectors w;, W2 not lying on one‘line, then every vector v can be obtained from 
these two vectors as a “linear combination” 


CiW, + Co2W2, 
where c,, C2 are scalars. So W = RR? in this case. If W does not contain two such 


vectors, then we are in one of the remaining cases. o 


ONAL eae (C51; 05) 


~ 
fo 

<= 
to 
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Similarly, it can be shown that the subspaces of R’ are of four types: 
(i) the zero vector; — — 


(ii) the vectors lying on a line through the origin; 
+ (iii) the vectors lying in a plane through the origin; 
(iv) the whole space R?. 


This classification of subspaces of R? and R° will be clarified in Section 4 by the 
concept of dimension. 

Systems of homogeneous linear equations furnish many examples. The set of _ 
solutions of s such a system is always a subspace. For, if we write the system in matrix 
“notation as AX = 0, where A is an m X n matrix and X is a column vector, then it is 


clear that 


(a) AX = O and AY = 0 imply A(X + Y) = 0. In other words, if X and Y are solu- 
tions, so is X + Y. 
(b) AX = O implies Acx = 0: If X is a solution, so is cx. 
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For example, let W be the set of solutions of the equation 
(1.3) 2x; — X2 — 2x3 = 0, or AX = 0, 


where A = [2-1 2]. This space is the set of vectors lying in the plane through the 
origin and orthogonal to A. Every solution is_a linear combination ciw + c2w2 of 
two ‘oapablicniag solutions Ww, W>. Most pairs ‘of solutions, for example i 


i ] 
(1.4) w=10],w={2 4, 
] 0 


will span the space of solutions in this way. Thus every solution has the form 
Cre? 
(1.5) Ciwi + C22 = 2c2 |, 
C\ 


where c,, C2 are arbitrary constants. Another choice of the particular solutions w;, w2 


would Fesult 1 in a different but equivalent “description on of the space of all solutions. 


se poo SO I eee eo 

Zz 

6) Definition. A real vector space is a set V together with two laws of compo- 
sition: eo 


(a) Addition: VX V—— V, written v, w»~~v + w 


(b) Scalar multiplication: R x V—- V, written c, v~~> cv 


These laws of composition must satisfy the following axioms: 


re 


(i) einen makes V into an abelian group V". 
(ii) Scalar multiplication is associative with multiplication of real numbers: 


(ab)t = a(bv). 


(111) Scalar multiplication by the real number | is the identity operation: 


(iv) Two distributive laws hold: 
(a + bkt = avo + be 
a(t + w) = av + aw. 


Of course all the axioms should be quantified universally; that is, they are assumed 
to hold for all a,b € R and all v,w € V. 


The identity element for the addition law in V is denoted by 0, or by 0, if there 
is danger of confusing the zero vector with the number zero. 
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Notice that scalar multiplication associates to every pair consisting of a real 
number c and a vector v another vector cv. Such a rule is called an external law of 
composition on the vector space. 

Multiplication of two vectors is not a part of the structure, though various 
products, such as the cross product of vectors in R*, can be defined. These products 
aren't completely intrinsic, they depend on choosing coordinates. So they are con- 
sidered to be additional structure on the vector ‘space. 

Read axiom (ii) carefully. The left side means multiply a and b as real num- 
bers, then scalar multiply ab and v, to get a vector. On the right side, both opera- 
tions are scalar multiplication. 

The two laws of composition are related by the essential distributive laws. 
Note that in the first distributive law the symbol + on the left stands for addition of 
real numbers, while on the right, it stands for addition of vectors. 


(1.7) Proposition. The following identities hold in a vector space V: 


(a) Opt = Oy, for all v € V, 
(b) cOy = Oy, for allc € R, 
(Cy (— War = —v, for all'c © V. 


Preof. To see (a), we use the distributive law to write 
Ov + Ov = (0 + O)v = Ov = Ov + O. 


Cancelling Ov from both sides, we obtain Ov = 0. Please go through this carefully, 
noting which symbols 0 refer to the number and which refer to the vector. 
Similarly, cO + cO = c(O + 0) = cO. Hence cO = 0. Finally, 


o+ -Ilco = lv + -lo = (1 + —1)o = Ov = 0. 


Hence —1t is the additive inverse of v. o 


(1.8) Examples. 


(a) A subspace of IR” is a vector space, with the laws of composition induced from 
those on R”. 

(b) Let V = C be the set of complex numbers. Forget multiplication of complex 
numbers, and keep only addition a + B and multiplication ca of a complex 
number a by a real number c. These operations make C into a real vector 
space. 

__Se) The set of real polynomials p(x) = anx” + +++ + do is a vector space, with 
addition of polynomials and multiplication of polynomials by scalars as its 
laws of composition. 

(d) Let V be the set of continuous real-valued functions on the interval [0, 1]. Look 
only at the operations of addition of functions f + g and multiplication of 
functions by numbers cf. This makes V a real vector space. 
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Note that each of our examples has more structure than we look at when we 
view it as a vector space. This is typical. Any particular example is sure to have 
some extra features which distinguish it from others, but this is not a drawback of the 
definition. On the contrary, the strength of the abstract approach lies in the fact that 
consequences of the general axioms can be applied to many different examples. 


2. ABSTRACT FIELDS 


It is convenient to treat the real and complex cases simultaneously in linear algebra. 
This can be done by listing the properties of the “scalars” which are needed axiomat- 
ically, and doing so leads to the notion of a field. 

It used to be customary to speak only of subfields of the complex numbers. A 
subfield of C is any subset which is closed under the four operations addition, sub- 
traction, multiplication, and division, and which contains |. In other words, F is a 
subfield of C if the following properties hold: 


(2.1) 


(a) Ifa,b € F, thena + b E F. 

(b) If a € F, then -a E F. 

(c) If a,b € F, then ab E& F. 

(d) Ifa € F anda # 0, thena'™' € F. 
(ec) 1 GP. 


Note that we can use axioms (a), (b), and (e) to conclude that 1 — 1 = O is an ele- 
ment of F. Thus F is a subset which is a subgroup of C* under addition and such 
that F — {0} = F” is a subgroup of C* under multiplication. Conversely, any such 
subset is a subfield. 

Here are some examples of subfields of C: 


(2.2) Examples. 


(a) F = R, the field of real numbers. 
(b) F = Q, the field of rational numbers (= fractions of integers). 


(c) F = Q['V2], the field of all complex numbers of the form a + bV2, where 
abeEQ. 


It is a good exercise to check axioms (2.1) for the last example. 

These days, it is customary to introduce fields abstractly. The notion of an ab- 
stract field is harder to grasp than that of a subfield of C, but it contains important 
new classes of fields, including finite fields. 
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(2.3) Definition. A field F is a set together with two laws of composition 
FX F—>F and FXF—>F 
a,bwwra +b a, bw ab 


called addition and multiplication, and satisfying the following axioms: 


(1) Addition makes F into an abelian group F*. Its identity element is denoted 
by 0. 

(11) Multiplication is associative and commutative and makes F* = F — {0} into a 
group. Its identity element is denoted by 1. 


(iii) Distributive law: For all a,b,c € F, (a + b)c = ac + be. 


The first two axioms describe properties of the two laws of composition, addition 
and multiplication, separately. The third axiom, the distributive law, is the one 
which relates addition to multiplication. This axiom is crucial, because if the two 
laws were unrelated, we could just as well study each of them separately. Of course 
we know that the real numbers satisfy these axioms, but the fact that they are all that 
is needed for arithmetic operations can only be understood after some experience in 
working with them. 

One can operate with matrices A whose entries aj are in any field F. The dis- 
cussion of Chapter 1 can be repeated ‘without change, and you should go back to 
look at this material again with this in mind. 

The simplest examples of fields besides the subfields of the complex numbers 

a are certain finite fields called the prime fields, which we will now describe. We saw 
3“ in Section 9 of Chapter 2 that the set Z/nZ of congruence classes modulo n has laws 
of addition and multiplication derived from addition and multiplication of integers. 
Now_all of the axioms for a field hold for the integers, except for the existence of 
multiplicative i inverses in axiom (2.3ii). The integers are not closed under division. 
And as we have already remarked, such axioms carry over to addition and multipli- 
cation of congruence classes. But there is no reason to suppose that multiplicative in- 
verses will exist for congruence classes, and i int fact t they need not. The class of 2, for 
Lapel does not have a iia inverse modulo 6. call it is a surprising fact 


verses. = therefore the set Z/pZ is a field. This field is called a prime (ag and is 


usually denoted by F,: eo ‘ 


(2.5) Theorem. Let p be a prime integer. Every nonzero congruence class @ 
(modulo p) has a multiplicative inverse, and hence F, is a field with p elements. 
The theorem can also be stated as follows: 


26 Let p be a prime, and let a be any integer not divisible by p. 
There is an integer b such that ab = | (modulo p). 


waa - o' 
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For ab = | (modulo p) is the same as @b = ab = 1, which means that b is the mul- 
tiplicative inverse of @. a any 

For example, let p = 13 and @ = 6. Then a™' = 11 because 


6 - 11 = 66 = 1 (modulo 13). 


Finding the inverse of a congruence class @ (modulo p) is not easy in general, but it 
can be done by trial and error if p is small. A systematic way is to compute the pow- 
ers of a. Since every nonzero congruence class has an inverse, the set of all of them 
forms a finite group of order p — 1, usually denoted by Fp” So every element @ has 
finite order dividing p — 1. Thus if p = Aon a = 3, we find a? = 9, and 
a? = 27 = 1, which shows that @ has order e are lucky: @_' = a? = 9. On the 
other hand, if we had tried this method with @ = 6, we would have found that 6 has 
order :12. The computation would have been lengthy. 


Proof of Theorem (2.5). Let @ © Fp be any nonzero element, and let us use 
the method just discussed to show that @ has an inverse. We consider the powers 
1,a,a’,a’,.... Since there are infinitely many powers and only finitely many ele- 
ments in F,, there must be two powers which are equal, say @” = a”, where 
m <n. At this point, we would like to cancel a”, to obtain 1 = a”~™. Once this 
cancellation is justified, we will have shown that @”-”~' is the inverse of @. This 
will complete the proof. 

Here is the cancellation law we need: 


(2.7) Lemma. Cancellation Law: Let @,T,d be elements of Fp with a # 0. If 
ac = ad, thenc = d. 


Proof. Set b = @ — d. Then the statement of the lemma becomes: If ab = 0 
and a # 0, then b = 0. To prove this, we represent the congruence classes @, b by 
integers a,b. Then what has to be shown is the following intuitively plausible fact: 


(2.8) Lemma. Let p be a prime integer and let a, b be integers. If p divides the 
product ab, then p divides a or p divides b. 


Proof. Suppose that p does not divide a, but that p divides ab. We must show 
that p divides b. Since p is a prime, | and p are the only positive integers which di- 
vide it. Since p does not divide a, the only common divisor of p and a is 1. So 1 is 
their greatest common divisor. By Proposition (2.6) of Chapter 2, there are integers 
r,s so that 1 = rp + sa. Multiply both sides by b: b = rpb + sab. Both of the 
terms on the right side of this equality are divisible by p; hence the left side a is di- 
visible by p too, as was to be shown. o 


As with congruences in general, computations in the field F, can be made by 
working with integers, except that division can not be carried out in the integers. 
This difficulty can often be handled by putting everything on a common denomina- 
tor in such a way that the required division is left until the end. For example, suppose 
we ask for solutions of a system of n linear equations in n unknowns, in the field F,. 


Lair | 


ae compute 


No 
ary 
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We represent the system of equations by an integer system, choosing representatives 
for the residue classes in a convenient way. Say that the integer system is AX = B, 
where A is an m X nm integer matrix and B is an integer column vector. Then to solve 
the system in F,. we try to invert the matrix A modulo p. Cramer’s Rule, 
(adj AJA = 6/, where 6 = det A, is a formula valid in the integers [Chapter 1 (5.7)], 
and therefore it also holds in F, when the matrix entries are replaced by their con- 
gruence classes. If the residue class of 6 is not zero, then we can invert the matrix A 
in F, by computing 67'(adj A). 


een Corollary. Consider a system AX = B of n linear equations in n unknowns 
where the entries of A.B are in F,. The system has a unique solution in F, if 
det A # OinF,,. | 


For example, consider the system of linear equations AX = B, where 


-Ejae[ 


Since the coefficients are integers, they define a system of equations in F, for any 
prime p. The determinant of A is 42, so the system has a unique solution in F, for all 
p»p different from 2,3 and 7. Thus if p = 13, we find det A = 3 when evaluated 
(modulo 13). We already saw that 3 ' = 9 in F,,. So we can use Cramer’s Rule to 


2-1 Tle 
Ae : 7 and X=A'B= gf in Fe. 


5 The system has no solution in F. or [F. It happens to have solutions in F;, though 
‘det A = 0 in that field. 

~~ "We remark in passing that invertible matrices with entries in the field F, pro- 
vide new examples of finite groups—the general linear groups over finite fields: 


GL, (F,) = {n X n invertible matrices with entries in Fp}. 
Sed 


The smallest of these is the group GL>(F-) of invertible 2 x 2 matrices with entries 
(modulo 2), which consists of the six matrices 


(2.10) 


aur ([ hi ERE HE EG 


There is one property of the finite fields F = Fp which distinguishes them 
from subfields of’ C°and which affects computations occasionally. This property is 
that adding 1 to itself a certain number of times (in fact p times) gives 0. A field F 
is said to have ¢ characteristic p if 1 + --- + 1 (p terms) = 0 in F, and if p is the 
smallest positive i integer T with that en In other words, the characteristic of F is 
the order of 1, as an element of the additive group F* . provided that the order 
is finite (Chapter 2, Section 2). In case the order is infinite, thateis, | + ---- <M is 
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never 0 in F, the field is, paradoxically, said to have characteristic zero. Thus 
subfields of C have characteristic zero, while the prime field F, has characteristic 
p. It can be shown that the characteristic of any field F is either zero or a prime 
number. i a 


j Now let F be an arbitrary field. A vector space over a field F is defined as in 


(1.6), with F replacing R. 


(2.11) Definition. A vector space V over a field F is a set together with two laws 
of composition: ee 


(a) addition: VX V-—— V, written v, w~~~v + w, 
(b) scalar multiplication: F x V—— V, written c,o~~~ cv, 


and satisfying the following axioms: 


(i) Addition makes V into a commutative group V*. 
(ii) Scalar multiplication is associative with multiplication in F: 


(ab)v = a(bv), for alla,b € F and v € V. 


(iii) The element 1 acts as identity: lv = v, for all vo € V. 
(iv) Two distributive laws hold: 


(a+ bv = av + bv and a(v + w) = av + aw, 
for alla,b € F and v,w E V. 


All of Section 1 can be repeated, replacing the field R by F. Thus the space F” 
of row vectors (a1,...,an), ai © F, is a vector space over F and so on. 

It is important to note that the definition of vector space includes implicitly the 
choice of.a field F. The elements of this field F are often called | scalars. We usually 
keep this field fixed. Of course, if V is a complex vector space, meaning a vector 
space over the field C, and if F C C is any subfield, then V is also naturally a vector 
space over F because cv 1s defined for all c € F. But we consider the vector space 
structure to have changed when we restrict the scalars from C to F. 

Two important concepts analogous to subgroups and isomorphisms of groups 
are the concepts of subspace and of isomorphism of vector spaces. We have already 
defined subspaces for complex vector spaces, and the definition is the same for any 
field. A subspace W of a vector space V (over a field F) is a subset with the follow- 
ing properties: — 


(2:12) 


(a) If w,w’ € W, then w + w’ E W. 
(b) If w € Wandc E F, then cw E W. 
(c) OC W. 


4 . 
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A subspace W is called a proper subspace of V if it is neither the whole space V nor 
the zero subspace {0}. 4 

It is easy to see that a subspace is just a subset on which the laws of composi- 
tion induce the structure of vector space. 

As in Section |, the space of all solutions of a sy 


stem of m linear equations in 
n unknowns ~ — ee a> aes oo 


AX = 0, 


with coefficients in F, is an example of a subspace of the space F”. 


(2.13) Definition. An isomorphism ¢ from a vector space V to a vector space V’, 
both over the same field F, is a bijective map ¢: V-—V’ compatible with the laws 
of composition, that is, a bijective map satisfying 


(a) g(v + v') = y(v) + p(v') and (b) g(cv) = ce(v), 
for all v,v’ € Vandallc E€ F. 


(2.14) Examples. 


(a) The space F” of n-dimensional row vectors is isomorphic to the space of n- 
dimensional column vectors. 

(b) View the set of complex numbers C as a real vector space, as in (1.7b). Then 
the map yg: R?——> C sending (a, b) ~~ + bi is an isomorphism. 


3, BASES AND DIMENSION | * ‘Fp \ 


In this section we discuss the terminology used when working with the two opera- 
tions, addition and scalar multiplication, in an abstractly given vector space. The 
new concepts are span, linear independence, and basis. 

It will be convenient to work with ordered sets of vectors here. The ordering 
will be unimportant much of the time, but it will enter in an essential way when we 
make explicit computations. We’ve been putting curly brackets around unordered 
sets, so in order to distinguish ordered from unordered sets, let us enclose ordered 
sets with round brackets. Thus the ordered set (a, b) is considered different from the 
ordered set (b, a), whereas the unordered sets {a, b} and {b, a} are considered equal. 
Repetitions will also be allowed in an ordered set. So (a, a, b) is considered an or- 
dered set, and it is different from (a, b), in contrast to the convention for unordered 
sets, where {a, a, b} would denote the same set as {a, }. 


Let V be a vector space over a field F, and let (v),..., Un) be an ordered set of 
elements of V. A linear combination of (v,,..., Un) is any vector of the form 
(3.1) W = C\0; + Cot. + + + Crtn, Gi & F. 


\S: & V ‘ 


9 Ur «tee ghana beeeny 


c vA, in 
- hh 
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For example, suppose that the ordered set consists of the two vectors in RR? 
considered in (1.4): v, = (1,0, 1)' and v2‘= (1,2,0)'. Then a linear combination 
will have the form (1.5): (c: + c2,2c2,c1)'. The vector (3, 4, 1)' = v1 + 2v2 is one 
such linear combination. 

A solution x of a system of linear equations written in the matrix form AX = B 
[Chapter 1 (1.9)] exhibits the column vector B as a linear combination of the 
columns of the matrix A. The coefficients are the entries of the vector x. 

A linear combination of a single vector (v) is just a multiple cv or v 

The set of all vectors 'w which are linear combinations of (t,,..., Un) forms a 
subspace W of V, called the subspace spanned by the set: If w (3.1) and 


w! = ¢;'v, + «+: + Cn'Un are elements of W, then So is 


web wl = (CM cy jones or (Cr Cy Un, 


and ifa € F, then aw = (ac,)v; + +++ + (acn)vn is in W. Sow + w’ and aw are in 
W. Finally, 0 = Ov, + -:- + Ov, © W. This shows that the conditions of (2.12) 
hold. 

The space spanned by a set S will often be denoted by Span S, Clearly, Span S 
is the smallest subspace of V which contains S. We could also call it the subspace 
generated by S. Note that the order is irrelevant here. The span of S is the same as 

One can also define the spa of an infinite set of vectors. We will discuss this 
in Section 5. In this section, let us assume that our sets are finite. 


ge 


(3.2) Proposition. Let S be a set of vectors of V, and let W be a subspace of V. If 
SC W, then Span S C W. a 


—— 


This is obvious, because W is closed under addition and scalar multiplication. If 
SCW, then any linear combination of vectors of S is in W too. o 


A linear relation among vectors v,,..., Un is any relation of the form 


acorn enn ne 


(3.3) C101 + C202 + > + Cnt, = 0, 
where the coefficients c; are in F, An ordered set (v;,..., Un) of vectors is called lin- 


early independent if there is no linear relation among the vectors in the set, except 
for the trivial one in which all the coefficients c; are zero. It is useful to state this 
condition positively: 


13-4) Let (v;,..., Un) be a linearly independent set. Then 


from the equation cv, + +++ + Cntn = 0, 
we can conclude that c; = 0 for every i = 1,...,n 


Conversely, if (3.4) holds, then the vectors are linearly independent. 
The vectors (1.4) are linearly independent. 
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Note that a linearly independent set S can not have any repetitions. For if two 
vectors vj, t, of S are equal, then — 


vi -— vj = 0 


is a linear relation of the form (3.3), the other coefficients being zero. Also, no VEC- 
tor vt, of a linearly independent family may be zero, because if it is, then vj = 0 is a 
linear relation. 

A set which ts not linearly independent is called linearly dependent . 

If V is the space F” and if the vectors (v,,..., Un) are given explicitly, we can 
decide linear independence by solving a “a of homogeneous linear equations. — 
For to say that a linear combination x;v; + -+- + XU» is zero means that each coor- 
dinate is zero, and this leads to m equations in the n unknowns x;. For example, con- 
sider the set of three vectors 


l 1 2 
(3.5) vo, = 1/0],m={[21,05=] 1 
] 0 2 
Let a denote the matrix whose columns are these vectors: 
a ae: uate, 
(3.6) eee ee dst Az © 4S 
Pb eor-2 


A yeneral linear combination of the vectors will have the form x0; + x202 + x303. 
Bringing the scalar coefficients to the other side, we can write this linear combina- 
tion in the form AX, where X¥ = (x,,x2,%3)'. Since det Als a the equation AX =0 
has only the trivial solution, and this shows that (v,, v2, Greats A linearly independent 
‘set. On the « other hand, if we add an arbitrary fourth vector v4 to this set, the result 
will be linearly dependent, because every system of three homogeneous equations in 
four unknowns has a nontrivial solution [Chapter | (2.17)]. 
Here are some elementary facts about linear independence. 


—— ree 


(3.7) Proposition. 


(a) Any reordering of a linearly independent set is linearly independent. 
(b) If v; © V is a nonzero vector, then the set (v;) is linearly independent. 
(cy A set (v1, v2) of Vo vectors is linearly independent if and only if either 
“Neo 
v1 . 0, ae vis Sito of v1. 


Let us verify he an of _ assertions: Assume (v1, v2) dependent. Let the rela- 
tion be c,v; + c2v2 = 0, where c;, c2 are not both zero. If co # 0, we can solve for 
U2: 

ae 


(BE) = Ng 
C2 
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In this case v2 is a multiple of v,. If c. = 0, then c, # O and the equation shows that 
tv, = 0. Conversely, if v2 = cv,, then the relation cu, — v2 = 0 shows that the set 
(v,, v2) is linearly dependent, and if v, = 0, then the relation v, + Ov. = 0 shows 
the same thing. o 


A set of vectors (v;,..., Un) which is linearly independent and which also spans 
V is called a basis. For example, the vectors (1.4) form a basis for the space of solu- 
tions of the linear equation (1.3). We will often use a symbol such as B to denote a 
basis. 

Let B = (v),..., Un) be a basis. Then since B spans V, every w € V can be 
written as a linear combination (3.1). Since B is linearly independent, this expres- 


sion is unique-+ot#i, povbutayr PB. 


(3.8) Proposition. The set B = (v),...,U,) 1s a basis if and only if every vector 
w € V can be written in a unique way in the form (3.1). 


Proof. Suppose that B is a basis and that w is written as a linear combination in 
two ways, say (3.1) and also w = c,’v; + +++ + cy't,. Then 


SP eS (c a C1’ )v; 2 a eas (cr =a Cn Un. 


Hence by (3.4) c: ~ ci’ = 0,...,¢n — Cn’ = 0. Thus the two linear combinations 
are the same. On the other hand, the definition of linear independence for B can be 
restated by saying that 0 has only one expression as a linear combination. This 
proves the converse. o 


(3.9) Example. Let V = F” be the space of column vectors, and let e; denote the 
column vector with | in the ith position and zeros elsewhere. The n vectors e; form 
a basis for F” called the standard basis. This basis was introduced before, in Chap- 
ter 1, Section 4. We will denote it by E. Every vector X = (x1,...,xn)' has the 
unique expression mr 


X = xe, tots + Xn€n 
as a linear combination of E = (e),..., én). 
The set (3.5) is another basis of R?. 


We now discuss the main facts (3.18—3.17) which relate the three notions of 
span, linear independence, and basis. 


(3.10) Proposition. Let L be a linearly independent ordered set in V, and let 
v € V be any vector. Then the ordered set L' = (L, v) obtained by adding v to L is 
linearly independent if and only if v is not in the subspace spanned by L. 


Proof. Say thatL = (t,...,v,r). If vo © Span L, then v = civ, + -: + ct, 
for some c; € F. Hence 


cit) + -* + ct, F (-1)o = 0 
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is a linear relation among the vectors of L', and the coefficient -1 is not zero. Thus 
L’ is linearly dependent. 


Conversely, suppose that L’ is linearly dependent, so that there is some linear 
relation 


Cit, + ++ + c/o, + bo = 0, 


in which not all coefficients are zero. Then certainly b # 0. For, if b were zero, the 
expression would reduce to 


nh) ae 000 ae Gabe SS 0). 


Since L is assumed to be linearly independent, we could conclude that 


c; = +++ = cr = 0 too, contrary to hypothesis. Naw that we know b # 0, we can 
solve for v: 
fap =G 
v=—uyte t+ ra 


Thus t € Span L. o 


(3.11) Proposition. Let S be an ordered set of vectors, let v © V be any vector, 
and let S’ = (S,v). Then Span S = Span S' if and only if ¢ © Span S. 


Proof. By definition, t © Span S’. So if tv € Span S$, then Span S # 
Span S'. Conversely, if e © Span S, then S’ C Span S; hence Span S’ C Span S 
(3.2). The fact that Span S’ D Span S is trivial, and so Span S’ = Span S. o 


(3.12) Definition. A vector space V is called finite-dimensional if there is some 
finite set S which spans V. 

For the rest of this section, we assume that our given vector space V is finite- 
dimensional. 


\ 43.13) Proposition. Any finite set S which spans V contains a basis. In particular. 
any finite-dimensional vector space has a basis. 


Proof. Suppose S = (t1...., t,) and that S is not linearly independent. Then 
there is a linear relation 


Cc, Uv, ar sae oF Cnty = 0 
in which some c¢; is not zero, say C, # 0. Then we may solve for tp: 


HET —Cn-1 
ty = ey tt 
Gi Cn 


(Orystia 


This shows that tn & Span(e),...,Un-1). Putting c = tp and S = (t),...,Un-1) in 
(3.11), we conclude Span(c,,...,Un—1) = Span(t),...,tn) = V. So we may elimi- 
nate t,, from S. Continuing this way we eventually obtain a family which is linearly 
independent but still spans V—a basis. 
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_Note. There is a problem with this proof if V is the zero vector space {O}. For, 
starting with an arbitrary collection of vectors in V (all of them equal to zero), our 
procedure will throw them out, one at a time, until there is only one vector v,; = 0 
left. And (0) is a linearly dependent set. How can we eliminate it? Of course the 
zero vector space is not particularly interesting. But it may lurk around, waiting to 
trip us up. We have to allow the possibility that a vector space which arises in the 
course of some computation, such as solving a system of homogeneous linear equa- 
tions, is the zero space. In order to avoid having to make special mention of this 
case in the future, we adopt the following conventions: 


x14) (a) The empty set is linearly independent. 
(b) The span of the empty set is the zero subspace. 


Thus the empty set is a basis for the zero vector space. These conventions allow us 
to throw out the last vector v; = 0, and rescue the proof. 


(3.15) Proposition. Let V be a finite-dimensional vector space. Any linearly inde- 
pendent set L can be extended by adding elements, to get a basis. 


Proof. Let S be a finite set which spans V. If all elements of S are in Span L, 
then L spans V (3.2) and so it is a basis. If not, choose v € S, which is not in 
Span L. By (3.10), (L, v) is linearly independent. Continue until you get a basis. o 


(3.16) Proposition. Let S, L be finite subsets of V. Assume that S spans V and that 
L is linearly independent. Then S contains at least as many elements as L does. 


Proof. To prove this, we write out what a relation of linear dependence on L 
means in terms of the set S, obtaining a homogeneous system of m linear equations 
in n unknowns, where m=|S| and n=|L|. Say that S = (v),...,Um) and 
L = (w,...,Wn). We write each vector w; as a linear combination of S, which we 
can do because S spans V, say 


- 
Wj = ayt, + e*+ + Anjtm = > Ajj. 
i 
Let u=ciw, + ++ + CrW, = 2jcjw; be a linear combination. Substituting, we 
obtain 
ue SS Cjaijti. 
iJ 


The coefficient of v; in this sum is 2;ajjcj. If this coefficient is zero for every i, then 
u = 0. So to find a linear relation among the vectors of L, it suffices to solve the 
system 2;aijx; = O of m equations in n unknowns. If m <n, then this system has a 
nontrivial solution [see Chapter | (2.17)], and therefore L is linearly dependent. 5 


(3.17) Proposition. Two bases B,, B2 of the vector space V have the same number 
of elements. 
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Proof. Put B, = S, B: = L in (3.16) to get |B,| =|B2j. By symmetry, 
|B2| = |B, J. o 


(3.18) Definition. The_dimension of a finite-dimensional vector space V is the 
number of vectors in a basis. The dimension will be denoted by dim V. 


(3.19) Proposition. 


(a) If S spans V, then |S| = dim V, and equality holds only if S is a basis. 
(b) If Z is linearly independent, then || < dim V, and equality holds only if L is 
a basis. 


Proof. This follows from (3.13) and (3.15). o 


(3.20) Proposition. If W C V is a subspace of a finite-dimensional vector space, 
then W is finite-dimensional, and dim W <= dim V. Moreover, dim W = dim V 
only if W = V. 


Proof. This will be obvious, once we show that W is finite-dimensional. For, if 
W < J, that is, if W is contained in but not equal to V, then a basis for W will not 
span V, but it can be extended to a basis of V by (3.15). Hence dim W < dim V. We 
now check finite-dimensionality: If some given linearly independent set L in W does 
not span W, there is a vector w € W not in Span L, and by Proposition (3.10), 
(L, w) is linearly independent. So, we can start with the empty set and add elements 
of W using (3.10), hoping to end up with a basis of W. Now it is obvious that if L is 
a linearly independent set in W then it is also linearly independent when viewed as a 
subset of V. Therefore (3.16) tells us that |L| <n = dim V. So the process of 
adding vectors to L must come to an end after at most n steps. When it is impossible 
to apply (3.10) again, L is a basis of W. This shows that W is finite-dimensional, as 
required. o 


Notes. 


(a) The key facts to remember are (3.13), (3.15), and (3.16). The others follow. 

(b) This material is not deep. Given the definitions, you could produce a proof of 
the main result (3.16) in a few days or less, though your first try would probably 
be clumsy. 


One important example of a vector space is obtained from av arbitrary Set S by 
forming [inear combinations of elements of 5 with coefficients in F in a formal way. 


If S = (si,..., Sn) is a finite ordered set whose elements are distinct, then this space 

V = V(S) iS the set of all expressions : 

rz) Vadisi t+ + OnSn, ai E F. we, = indy 
Lenn dt, ae 


ter Pk. CoO de 
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Addition and scalar multiplication are carried out formally, assuming no relations 
ee 
among the elements Si? 


(ays; + +** + anSn) + (Bis) + 0+ + Basn) = (ar + Bi)sy + <*> + (an + On)Sn 


c(aisi +> Sie a GxSu) — (ca;)5s\ a a8 “{ (Cah aan 
Ree 
This vector space I IS isomorphic to F", "| by the correspondence 


aE a pn 
el ate mere 


(3723) (aie “5 n) re iS ee Faas, 
Therefore the elements s;, interpreted as the linear combinations 


yo 167 Se Whe ar 922 sr Okan 


form a basiy which corresponds to the standard basis of F'” under the isomorphism 
(3, wet) Because of this, V(S) is often referred to as is the space with basis S, or the 
space of formal linear combinations of S. If S is an infinite set, V(S) is defined to be 
the space of all finite expressions (3.21), where s; € S (see Section 5). 

Since V(S) is isomorphic to F" when S contains n elements, there is no com- 
pelling logical reason for introducing it. However, in many applications, V (5) has a 
natural interpretation. For example, if S is a set of ingredients, then a vector v may 
represent a recipe. Or if S is a set of points in the plane, then v (3.21) can be inter- 
preted as a set of weights at the points of S. 


4. COMPUTATION WITH BASES 


(1 


The purpose of bases in vector spaces is to provide a method of computation, and we 
are going to learn to use them in this section. We will consider two gain to 
express a vector in terms of a given basis, and -how to relate two different bases of 
the same vector space. (ey 

Suppose we are given a basis (),..., Un) of a vector space V. Remember: This 
means that every vector t € V can be expressed as a linear combination 


(4.1) C= x0) tos t+ xXnta, XE F, 


in exactly one way. The scalars x, are called the coordinates of t, and the column 
vector a 


xX) 
xX = 
(4.2) : 
Xn 
is called the coordinate vector of v, with respect to the basis. We pose the problem 
of computing this coordinate vector. 


The simplesiae case to understand is that V is the space of column vectors | la 


eee me 


may mate erin ta te 


Section 4 Computation with Bases 95 


Let B = (v),..., Un) be a basis of F”. Then each element v; of our basis is a column 
vector, and so the array (v),..., U,) forms an n X n matrix. It seems advisable to in- 
troduce a new symbol for this matrix, so we will write it as 


| | 
(4.3) [B] =| 0,°-- wv, 


Se lit | 


For example, if B is the basis 


(4.4) v= 31 es Bi fo tls ; al 


If E = (e),...,@n) is the standard basis, the matrix [E] is the identity matrix. 
A linear combination x,v,; + --- + x,v, can be written as the matrix product 


| | ][ x 
(4.5) [BIX =] v,° °° Soke ae 
Se eee lk: 
where X denotes the column vector (x,...,Xn)'. This is another example of block 


multiplication. The only new feature is that the definition of matrix multiplication 
has caused the scalar coefficients x; to migrate to the right side of the vectors, which 


doesn’t matter. 
Now if a vector Y = (1,..., yn)’ is given, we can determine its coordinate vec- 


tor with respect to the basis B 3 by solving the equation ~~ 
| | x1 ys 

(4.6) O° Oe | = |: | Oran 
| | Xn Yn . 


for the unknown vector X. This is done by inverting the matrix [B]. 


ltt ) Proposition. Let B = (v,,..., Un) be a basis of F”, and let Y € F” be a vec- 
tor. The coordinate vector of Y with respect to the basis B is 


X = ([B]'Y.o 


ae 
Note that we get Y back if B is the standard basis E, because [E] is the identity ma- 
trix. This is as it should be. 


In Example (4.4), 
B kt eee . — 


Dive i) . 
So the coordinate vector of Y = a 1 is X= [i which means that 


Y= 70 aa 202. 
————— 


96 Vector Spaces Chapter 3 


“sf oy ov 
. all Of course we can not solve in this way unless the matrix is invertible. Fortu- 
‘| «f | nately, [B] is always invertible, and in fact it can be any invertible matrix. 
cue. C.K —————— — 
t “et * ) of ; 
~~ . A4.8) Proposition. Let A be an n X n matrix with entries in a field F. The columns 
\et% D4. of A form a basis of F” if and only if A is invertible. 
~ hee aiid 
” ow “aS Proof. Denote the ith column of A by v;. For any column _ vector 
do X = (x1,...,Xn)', the matrix product AX = v,x; + +++ + UpXn is a linear combination 


ye sx _ Of the set (u,,...,tn). So this set is linearly independent if and only if the only solu- 

- \ tion of the equation AX = 0 is the trivial solution ¥ = 0. And as we know, this is 

“+ true if and only if A is invertible (Chapter | (2.18)]. Morever, if (v1,..., Un) is a lin- 
early independent set, then it forms a basis because the dimension of F” is n. o 


Now let V be an abstractly given vector space. We want to use matrix notation 
to facilitate the manipulation of bases, and the way we have written ordered sets of 
vectors was chosen with this in mind: 


aoe i (4.9) (Cigeente): 


Perhaps this array should be called a hypervector. Unless our vectors are given con- 
cretely, we won’t be able to represent this hypervector by a matrix, so we will work 
ill with it formally, as if it were a vector. Since multiplication of two elements of a 


vector space is not defined, we can not multiply two matrices whose entries are vec- 


tors. But there is nothing to prevent us from multiplying the hypervector (v),..., Um) 
by a matrix of scalars. Thus a linear combination of these vectors can be written as 
the product with a column vector X: a Pa. 

Xx e 
(4.10) (040005: Om) = 0X1 + +o- + UmIm. 

Xm 


Evaluating the product, we obtain another vector—a linear combination. The scalar 
coefficients x; are on the right side of the vectors as before. If we use a symbol such 
as B to denote the set (v,,..., Um), then the notation for this linear combination be- 
comes very compact; BX = 0x, + +++ + OnXn. 

We may also multiply a hypervector on the right by a matrix-of scalars. If A is 
an m X n matrix, the product will be another hypervector, say (w1,..., Wn): 


Sh 


(4.11) (150m) A = (wW,..., Wn). 


To evaluate the product, we use the rule for matrix multiplication: 
(4.12) Wj = vay + v2.ay + +* + Omany. 


So each vector wy is a linear combination of (v,,..., Um), and the scalar coefficients in 


(yy 
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this linear combination form the columns of the matrix A. That is what the equation 
means. For example, 


Se | 
wie) 3 0 1 = (3t; ata 4v2, 20), 0 aF t2). 


Let us restate this formally: 


(4.13) Proposition. Let S = (v),...,0m) and U = (wy,..., Wn) be ordered sets of 
elements of a vector space V. The elements of U are in the span of S if and only if 
there is an m X n scalar matrix A such that (v),...,0m)A = (Wi,..., Wn). 0 


Now let us consider the problem of determining the coordinate vector X of a 
given vector t € V with respect to a given basis B = (t),..., Un). That is, we wish 
to write ¢ = BX explicitly, as in (4.10). It is clear that this is not possible unless 
both the basis and the vector are given in some explicit way, so we can not solve the 
problem as posed. But we can use multiplication by the hypervector B to define ab- 
stractly an isomorphism of vector spaces” 


a 


(4.14) Ut ®——> V sending 

X ws BX, b- spusstsr 
from the space F" of column vectors to V. This map iC picousappecause every vec- 
tor v is a linear combination (4.10) in exactly one way— Surjective because the 


set B spans V. and injective because B is linearly independent. The axioms for an 
isomorphism (2.13) are easy to check. We can use this isomorphism to introduce co- 
ordinates into the vector space V. 
The coordinate vector of a vector t is X = W '(v). Please note that the symboi 
B~' is not defined. So unless the basis is given more specifically, we won’t have an 
explicit formula for the inverse function w'. But the existence of the isomorphism ys 
Is Of interest in itself: 
on F 

-15) Corollary. Every vector space V of dimension n is isomorphic to the space 

F*® of column vectors. o etd 


Notice that Ff” is not isomorphic to F” if m # n, because F” has a basis of n 
elements. and the number of elements in a basis depends only on the vector space, 
not on the choice of a basis. Thus the finite-dimensional vector spaces V over a field 
F are completely classified by (4.15): Every V is isomorphic to F”, for some 
uniquely determined integer n. It follows that we will know all about an arbitrary 
vector space if we study the basic examples of column vectors. This reduces any 
problem on vector spaces to the familiar algebra of column vectors, once a basis is 
given. 

We now come to a very important computational method: change of, basis. 
Identifving V with the isomorphic vector space F” is useful when a natural basis is 
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presented to us, but not when the given basis is poorly suited to the problem at hand. 
In that case, we will want to change coordinates. So let us suppose that we are given 
two bases for the same vector space V, say B = (v),..., Un) and B’ = (v,’,..., vy’). 
We will think of B as the old basis, and B’ as a new basis. There are two computa- 
tions which we wish to clarify. We ask first: How are the two bases related? Sec- 
ondly, a vector vu € V will have coordinates with respect to each of these bases, but 
of course they will be different. So we ask: How are the two coordinate vectors re- 
lated? These are the computations called change of basis. They will be very impor- 
tant in later chapters. They are also confusing and can drive you nuts if you don’t 
organize the notation well. 

We begin by noting that since the new basis spans V, every vector of the old 
basis B is a linear combination of the new basis B’ = (t,’,..., un’). So Proposition 
(4.13) tells us that there is an equation of the form 


(4.16) (1.00) P | = (Uijs.45 Ua), or B'P — B, 


mem 


where P is an n X n matrix of scalars. This matrix equation reads 
(4.17) v1! prj + V2! Py +--+ + On! Paj = vy, 


where py are the entries of P. The matrix P is called the matrix of change of basis. 
Its jth column is the coordinate vector of the old basis “vector vj, when computed 
with respect to the new basis B’. 

Note that the matrix of change of basis is invertible. This can be shown as fol- 
lows: Interchanging the roles of Band B’ provides a matrix P’ such that BP’ = B’. 


Combining this with (4.16), we obtain the relation BP’P = B: 


(01.05 )] P'P | = (v1,..., Un). 


This formula expresses each v; as a linear combination of the vectors (v),... Un). The 
entries of the product matrix P’P are the coefficients. But since B is a linearly inde- 
pendent set, there is only one way to write v; as such a linear combination of 
(v,,..., Un), namely v; = v;, or BJ = B. So P’P = J. This shows that P is invertible. 0 

Now let X be the coordinate vector of v, computed with respect to the old basis 
B, that is, v = BX. Substituting (4.16) gives us the matrix equation 


(4.18) vo = BX = B’ PX. 


This equation shows that PX = X’ is the coordinate vector of t with respect to the 
new basis B’. 

Recapitulating, we have a single matrix P, the matrix of change of basis, with 
the dual properties 


(4.19) B= B’P and PX =X’, 


where X, X’ denote the coordinate vectors of an arbitrary vector v with respect to the 
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two bases. Each of these properties characterizes P. Note the position of the primes 
carefully. 


old basis i is the standard. basis E, but where. ‘the r new basis B” is arbitrary. . The two 


bases determine matrices [E] = / and [B’], as in (4.3). Formula (4.19) gives us the 
matrix equation J = [B’]P. Hence the matrix of change of basis is 


(4.20) P=([B']', ifV = F" and if the old basis is E. 


We can also write this as [B’] = P™'. So 
(4.21) If the old basis is ©, the new basis vectors are the columns of P™' 


In the above discussion, the matrix P was determined in terms of two bases B 
and B’. We could also turn the discussion around, starting with just one basis B and 
an invertible matrix P € GL,(F). Then we can define a new basis by formula 
(4.16), that is. 


(4.22) ‘Bi = BP 


The vectors tu; making up the old basis are in the span of B’ because B = B'P 
(4.13). Hence B’ spans V and, having the right number of elements, B' is a basis. 


——— 


AB ) Corollary. Let B be a basis of a vector space V. The other bases are the 
sets of the form B’ = BP ', where P € GL, (F) is an invertible matrix. 


It is, of course, unnecessary to put an inverse matrix into this statement. Since P is 
arbitrary, so is P''. We could just as well set P-'' = Q and say B’ = BQ, where 
@ EGL,(F). 0 =< 


As an application of our discussion, let us compute the order of the general lin- 
ear group GL2(F) when F is the prime field F,. We do this by computing the number 


of bases of the vector space V = F’. Since aie dimension of V is 2, any linearly in- 
“dependent set (vu, , U2) of two elements forms a basis. The first vector v, en a linearly 
independent set is not zero. And since the order of F is p, V contains p? vectors in- 
cluding 0. So there are p* — 1 choices for the vector v,. Next, a set (v;, v2) of two 
vectors, with v; nonzero, is linearly independent if and only if v2 is not a multiple of 
u, (3.7). There are p multiples of a given nonzero vector t,. Therefore if v; is given, 


there are p? — p vectors v2 such that (v1, v2) is linearly independent. This gives us 
ie — Ge — p) = plp +aypeay 
a : 
bases for V altogether. Ree Es ‘y 
ey Corollary. The general linear group GL2(F,) has order p(p + 1)(p — 1)°. 


Proof. Proposition (4.23) establishes a bijective correspondence between 
bases of F” and elements of GLn(F). 0 
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5. INFINITE-DIMENSIONAL SPACES 


Some vector spaces are too big to be spanned by any finite set of vectors. They are 
called infinite-dimensional. We are not going to need them very often, but since they 
are so important in analysis, we will discuss them briefly. 

The most obvious example of an infinite-dimensional space is the space R® of 
infinite real vectors 


(5.1) (a) = (a), a2, Q3,...). 


It can also be thought of as the space of sequences {a,} of real numbers. Examples 
(1.7c, d) are also infinite-dimensional. 
The space R” has many important subspaces. Here are a few examples: 


(5.2) Examples. 
(a) Convergent sequences: C = {(a) € R® | lim An exists}. 
(b) Bounded sequences: €* = {(a) € R® | {a,} is bounded}. 


A sequence {a,} is called bounded if there is some real number b, a bound, 
such that |a,| < b for all n. 


(c) Absolutely convergent series: €' = {(a) € R®| >) |an| < &}. 
] 
(d) Sequences with finitely many nonzero terms: 
Z = {(a) € R® | a, = 0 for all but finitely many n}. 


All of the above subspaces are infinite-dimensional. You should be able to make up 
some more. 


Now suppose that V is a vector space, infinite-dimensional or not. What should 
we mean by the span of an infinite set S of vectors? The difficulty is this: It is not 
always possible to assign a vector as the value of an infinite linear combination 
c;0,; + c2v2 + +--+ in a consistent way. If we are talking about the vector space of 
real numbers, that is, v; € IR', then a value can be assigned provided that the series 
C10; + C2v2 + +++ converges. The same can be done for convergent series of vectors 
in R” or R®. But many series don’t converge, and then we don’t know what value to 
assign. é 

In algebra it is customary to speak only of linear combinations of finitely many 
vectors. Therefore, the span of an infinite set S must be interpreted as the set of 
those vectors v which are linear combinations of finitely many elements of S: 


(5.3) v= C\v; t + + cpvr, where wv,...,0, € S. 
The number r is allowed to be arbitrarily large, depending on the vector v: 


finite linear combinations 
of elements of S 


(5.4) Span S = 
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With this definition, Propositions (3.2) and (3.11) continue to hold. 

For example, let e; = (0.....0,1,0....) be the vector in &R’ with | in the ith 
position as its only nonzero coordinate. Let § = (e,,e2.e:....) be the infinite set of 
these vectors e;. The set § does not span R”, because the vector 


iyo — (00 Ia ea) Wee) 


is not a (finite) linear combination. Instead the span of S is the subspace Z (5.2d). 
A set S, infinite or not, is called linearly independent if there is no finite rela- 
tion 


(5.5) Che je tea, £1... ree o. 


except for the trivial relation, in which c; = --- = c, = 0. Again, the number r is 
allowed to be arbitrary, that is, the condition has to hold for arbitrarily large r and 
arbitrary vectors v;,...,u, © S$. For example, the sct S’ = (we), e2,¢3,...) is lin- 
early independent, if we, are the vectors defined as above. With this definition of 
linear independence, Proposition (3.10) continues to be true. 

As with finite sets, a basis S of V is a linearly independent set which spans V. 
Thus S$ = (e),e2,...) is a basis of the space Z. It can be shown, using the Axiom of 
Choice, that every vector space V has a basis. However, the proof doesn’t tell you 
how to get one. A basis for R* will have uncountably many elements, and therefore 
it can not be written down tn an explicit way. We won't need bases for infinite-di- 
mensional spaces very often. 

Let us go back for a moment to the case that our vector space V is finite- 
dimensional (3.12), and ask if there can be an infinite basis. In Section 3, we saw 
that any two finite bases have the same number of clements. We will now complete 
the picture by showing that every basis is finite. The only confusing point is taken 
care of by the following proposition: 


(5.6) Proposition. Let V be finite-dimensional. and let S be any set which spans 
V. Then S contains a finite subset which spans V. 


Proof. By assumption, there is some finite set, say (41,.... Wm), which spans 
V. Each w; is a linear combination of finitely many elements of S, since SpanS = V. 
So when we express the vectors 1)..... Wm In terms of the set S, we only need to use 
finitely many of its elements. The ones we use make up a finite subset S$’ C S. So, 
(Wi,..+)Wn) © Span S’. Since (w,..., Wm) spans V, so does S’. a 


(5.7) Proposition. Let V be a finite-dimensional vector space. 


(a) Every set S which spans V contains a finite basis. 
(b) Every linearly independent set L is finite and therefore extends to a finite basis. 


(c) Every basis is finite. 


We leave the proof of (5.7) as an exercise. 5 
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6. DIRECT SUMS 


Let V be a vector space, and let W,...., Wn be subspaces of V. Much of the treatment 
of linear independence and spans of vectors has analogues for subspaces. and we are 
going to work out these analogues here. 

We consider vectors te © V which can be written as a sum 
(6.1) vu = wy + «+s. ae Mine 


where 7; is a vector in W;. The set of all such vectors is called the sum of the sub- 
spaces or their span, and is denoted by 


(6.2) W, ap 0m 45 W,, = ab e V | SG foes + Wh. with 44, S Wt. 


The sum is a subspace of V, analogous to the span of a set {r;...., tn} of vectors. 
Clearly, it is the smallest subspace containing W,,..., Wn. 
The subspaces W,...., W,, are called independent if no sum wy + -** + Wp with 


wi © W is zero, except for the trivial sum in which w; = 0 for all 7. In other words, 
the spaces are independent if 


(6.3) wy tees toy, =O and wi © W, implies wi = 0 for all i. 


In case the span is the whole space and the subspaces are independent, we say 
that V is the direct sum of W,,....Wn, and we write 


(6.4) V=W.@---OwW,. fV=Wt+-:: + W, 
and if W,...., Wn are independent. 


This is equivalent to saying that every vector v © V can be written in the form (6.1) 
in exactly one way. 

So, if W,....Wn are independent subspaces of a vector space V and if 
U=W,+-+W, is their sum, then in fact U is their direct sum: 
U=Woe:-Ow,. 

We leave the proof of the following two propositions as an exercise. 


(6.5) Proposition. 


(a) A single subspace W, is independent. 
(b) Two subspaces W,, W2 are independent if and only if W, M Ws = (0). co 


(6.6) Proposition. Let W,,.... W, be subspaces of a finite-dimensional vector 
space V, and let B; be a basis for W;. 


(a) The ordered set B obtained by listing the bases B,,..., Bn in order is a basis of 
V if and only if V is the direct sum W,®-:- ®W,. 


(b) dim(W, + --- + W,) = (dim W,) + «+ + (dim W,,), with equality if and only 
if the spaces are independent. o 
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(6.7) Corollary. Let W be a subspace of a finite-dimensional vector space V. 
There is another subspace W’ such that V = W@W’. 


Proof. Let (w),...,wa) be a basis for W. Extend to a basis (wi,..., wa: 
D1 ,...,Un—a) for V (3.15). The span of (v,,..., Un—a) is the required subspace W’. o 


(6.8) Example. Let v,..... tu, be nonzero vectors, and let W, be the span of the sin- 
gle vector v,. This is the one-dimensional subspace which consists of all scalar mul- 
tiples of v.: W, = {ev,}. Then W,,...,W, are independent subspaces if and only if 
(v),...,Un) are independent vectors. This becomes clear if we compare (3.4) and 
(6.3). The statement in terms of subspaces is actually the neater one, because the 
scalar coefficients are absorbed. 


(6.9) Proposition. Let W,, W: be subspaces of a finite-dimensional vector space V. 
Then 


dim W, + dim W, = dim(W, a) W,) ate dim(W, ar W,). 


Proof. Note first that the intersection of two subspaces is again a subspace. 
Choose a basis (u,,..., ur) for the space Wi MN W2, where r = dim(W, M W,). This is 
a linearly independent set, and it is in W,. Hence we can extend it to a basis of W,, 
say 


(6.10) (Ue “Xie en) 
where m = dim W,. Similarly, we can extend it to a basis 


(6.11) (Fe ne 7 ees eee 


of W,, where n = dim W,. The proposition will follow if we show that the set 
(6.12) ee ro X ise. 14 Xe eine 


is a basis of W, + Wp. 

This assertion has two parts. First, the vectors (6.12) span W, + W,. For any 
vector t in W, + W, is asum v = w; + w2, with w; © W,. We can write w; as a 
linear combination of (6.10), and w, as a linear combination of (6.11). Collecting 
terms, we find that v is a linear combination of (6.12). 

Next, the vectors (6.11) are linearly independent: Suppose that some linear 
combination is zero, say 


ee et ae a DX ttt bm-rXm—r + Cry + ots + Cn—ryYn-r = 0. 


Abbreviate this as u + x + y = 0. Solve for y: y = —u — x © W,. But y € W, 
too. Hence y € W, M Wy, and so y is a linear combination, say u’, of (uy,..., ur). 
Then -u' + y = 0 is a relation among the vectors (6.11), which are independent. 
So it must be the trivial relation. This shows that y = 0. Thus our original relation 
reduces to u + x = O. Since (6.10) is a basis, this relation is trivial: u = 0 and 
x = 0. So the whole relation was trivial, as required. o 
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I don’t need to learn 8 + 7: I'll remember 8 + 8 and subtract 1. 


T. Cuyler Young, Jr. 


EXERCISES 
I. Real Vector Spaces 


1. Which of the following subsets of the vector space of real n X n matrices is a subspace? 
(a) symmetric matrices (A = A‘) 
(b) invertible matrices 
(c) upper triangular matrices 


. Prove that the intersection of two subspaces is a subspace. 

Prove the cancellation law in a vector space: If cv = cw and c # 0, then v = w. 
. Prove that if w is an element of a subspace W, then -w € W too. 

. Prove that the classification of subspaces of R* stated after (1.2) is complete. 

. Prove that every solution of the equation 2x, — x3 — 2x3; = 0 has the form (1.4). 


. What is the description analogous to (1.4) obtained from the particular solutions 
u, = (2,2, 1) and uw. = (0,2,-1)? 


NIAMS WH 


2. Abstract Fields 


1. Prove that the set of numbers of the form a + bV2, where a, b are rational numbers, is 
a field. 
2. Which subsets of C are closed under +, —, X, and + but fail to contain 1? 
3. Let F be a subset of C such that F* is a subgroup of C* and F™ is a subgroup of C*. 
Prove that F is a subfield of C. 
4. Let V = F" be the space of column vectors. Prove that every subspace W of V is the 
space of solutions of some system of homogeneous linear equations AX = 0. 
5. Prove that a nonempty subset W of a vector space satisfies the conditions (2.12) for a 
subspace if and only if it is closed under addition and scalar multiplication. 
6. Show that in Definition (2.3), axiom (ii) can be replaced by the following axiom: F™ is 
an abelian group, and 1 # 0. What if the condition 1 # 0 is omitted? 
7. Define homomorphism of fields, and prove that every homomorphism of fields is 
injective. 
8. Find the inverse of 5 (modulo p) for p = 2,3,7, 11, 13. 
9. Compute the polynomial (x? + 3x + 1)(x? + 4x? + 2x + 2) when the coefficients are 
regarded as elements of the fields (a) Fs (b) F. 
10. Consider the system of linear equations E A = il 
2 6 X2 =|| 
(a) Solve it in F, when p = 5,11, 17. 
(b) Determine the number of solutions when p = 7. 
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11. Find all primes p such that the matrix 


is invertible, when its entries are considered to be in Fp. 
12. Solve completely the systems of linear equations AX = B, where 


i jt @ 0 1 
A=/]1 01], B=]0 and B=]-1 
ale 0 1 


(a) inZ (b) inF, (c) inF; (d) in fF. 

13. Let p be a prime integer. The nonzero elements of F, form a group F,* of order p — 1. 
It is a fact that this group is always cyclic. Verify this for all primes p < 20 by exhibiting 
a generator. 


14. (a) Let p be a prime. Use the fact that F,* is a group to prove that a?~' = 1 (modulo p) 
for every integer a not congruent to zero. 
(b) Prove Fermat's Theorem: For every integer a, 


a? = a (modulo p). 


15. (a) By pairing elements with their inverses, prove that the product of all nonzero ele- 
ments of F, is —1. 
(b) Let p be a prime integer. Prove Wilson's Theorem: 
(p — 1)! = -1 (modulo p). 
16. Consider a system AX = B of n linear equations in n unknowns, where A and B have in- 
teger entries. Prove or disprove: If the system has an integer solution, then it has a solu- 
tion in F, for all p. 


17. Interpreting matrix entries in the field |. prove that the four matrices a ‘I E tf 


ie I 0 1 . 
[ ne | form a fit. 


18. The proof of Lemma (2.8) contains a more direct proof of (2.6). Extract it. 


3. Bases and Dimension 


1. Find a basis for the subspace of R* spanned by the vectors (1,2, -1,0), (4,8, -4, -3), 
(O71. 3.4), (2,5, 1,4). 
2. Let W C R‘ be the space of solutions of the system of linear equations AX = 0, where 


A= li : al Find a basis for W. 
3. (a) Show that a subset of a linearly independent set is linearly independent. 

(b) Show that any reordering of a basis is also a basis. 
4. Let V be a vector space of dimension n over F, and let0 = r = n. Prove that V contains 


a subspace of dimension r. 
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5. Find a basis for the space of symmetric n X n matrices. 

6. Prove that a square matrix A is invertible if and only if its columns are linearly 
independent. 

7. Let V be the vector space of functions on the interval [0, 1]. Prove that the functions x°, 
sin x, and cos x are linearly independent. 

8. Let A be an m X n matrix, and let A’ be the result of a sequence of elementary row opera- 
tions on A. Prove that the rows of A span the same subspace as the rows of A’. 

9. Let V be a complex vector space of dimension n. Prove that V has dimension 2n as real 
vector space. 

10. A complex n Xn matrix is éalled hermitian if aj = Gi for all i, 7. Show that the hermi- 
tian matrices form a real vector space, find a basis for that space, and determine its 
dimension, 

11. How many elements are there in the vector space F,,”? 

12. Let F = F2. Find all bases of F?. 

13. Let F = F;. How many subspaces of each dimension does the space F* contain? 

14. (a) Let V be a vector space of dimension 3 over the field F,. How many subspaces of 

each dimension does V have? 
(b) Answer the same question for a vector space of dimension 4. 

15. (a) Let F = F:. Prove that the group GL(F) is isomorphic to the symmetric group S;. 
(b) Let F = F,. Determine the orders of GL2(F) and of SL2(F). 

16. Let W be a subspace of V. 

(a) Prove that there is a subspace U of V such that U + W = VandU 1 W = 0. 


(b) Prove that there is no subspace U such that W MU =0 = and_ that 
dim W + dim U > dim V. 


il 


4, Computation with Bases 


1. Compute the matrix P of change of basis in F* relating the standard basis E to 


B’ = (r,.c2), where ec, = (1.3) 2 = (2, 2)" 
2. Determine the matrix of change of basis, when the old basis is the standard basis 
(ieee e,) and the new basis Is (@,,. @)—1..... e;). 


3. Determine the matrix P of change of basis when the old basis is (e,, e2) and the new basis 
NS (ig, PCa ei = Calh 

4. Consider the equilateral coordinate system for R*, given by the basis B’ in which cr, = e; 
and vc: is a vector of unit length making an angle of 120° with c;. Find the matrix relat- 
ing the standard basis E to B’. 

4. (1) Prove that theset B= ((1, 270). (2. 1.2) (aol is apace 
(11) Find the coordinate vector of the vector v = (1,2. 3)' with respect to this basis. 
(ii) Let BY = ((O. 1.0). (1.0. 1). (2. 1.03'). Find the matrix P relating B to B’. 
(iv) For which primes p is B a basis of F,°? 

6. Let B and B’ be two bases of the vector space F". Prove that the matrix of change of ba- 
sis is P = [B’] '[B]. 

7, BBS (05 cece Un) be a basis of a vector space V. Prove that one can get from B to any 
other basis B’ by a finite sequence of steps of the following types: 
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10. 


rm 


12. 


(i) Replace v; by vu; + av;, i # j, for some a € F. 

(ii) Replace v; by cr; for some c # 0. 

(iii) Interchange v; and t;. 

Rewrite the proof ot Proposition (3.16) using the notation of Proposition (4.13). 

Let V = F". Establish a bijective correspondence between the sets & of bases of V and 

GL,,(F). 

Let F be a field containing 81 elements, and let V be a vector space of dimension 3 over 

F. Determine the number of one-dimensional subspaces of V. 

Let P= Fe 

(a) Compute the order of SL2(F). 

(b) Compute the number of bases of F”, and the orders of GL,(F) and SL,,(F). 

(a) Let A be an m X n matrix with m <n. Prove that A has no left inverse by comparing 
A to the square n X n matrix obtained by adding (n ~ m) rows of zeros at the bottom. 


(beth = (ere. Um) and B’ = (v,’,..., Un’) be two bases of a vector space V. Prove 
that m =n by defining matrices of change of basis and showing that they are 
invertible. 


5. Infinite-Dimensional Spaces 


1. 


Prove that the set (w;e:,e2,...) introduced in the text is linearly independent, and de- 
scribe its span. 


2. We could also consider the space of doubly infinite sequences (a) = (...,@-1,.du,@)... ). 


= 


with a; € R. Prove that this space is isomorphic to R™. 


. Prove that the space Z is isomorphic to the space of real polynomials. 


Describe five more infinite-dimensional subspaces of the space R”. 

For every pesitive integer, we can define the space €” to be the space of sequences such 
that S| a; |? < =. 

(a) Prove that €” is a subspace of R*. 

(b) Prove that €? < €?*'. 

Let V be a vector space which is spanned by a countably infinite set. Prove that every 
linearly independent subset of V is finite or countably infinite. 


Prove Proposition (5.7). 


6. Direct Sums 


> 


. Prove that the space R”*” of all n Xn real matrices is the direct sum of the spaces of 


symmetric matrices (A = A‘) and of skew-symmetric matrices (A = —A'). 


. Let W be the space of n Xn matrices whose trace is zero. Find a subspace W’ so that 


R'™" = WeW’. 


. Prove that the sum of subspaces is a subspace. 
. Prove Proposition (6.5). 
. Prove Proposition (6.6). 
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Miscellaneous Problems 


1. (a) Prove that the set of symbols {a + bi | a, b © F,} forms a field with nine elements. 
if the laws of composition are made to mimic addition and multiplication of complex 
numbers. 

(b) Will the same method work for F;? For F;? Explain. 
*2, Let V be a vector space over an infinite field F. Prove that V is not the union of finitely 
many proper subspaces. 
Let W,,W2 be subspaces of a vector space V. The formula dim(W,; + W2) = dimW, + 
dim W, — dim(W; M W2) is analogous to the formula |S, U S2| = |S,| + |S2| — 
|S, M S2|, which holds for sets. If three sets are given, then 


|S, U $2 U S3| a {| S,| ar | S2 | ote | S3 | 
ap |S, N S| a |S, a) 53 | — | So a) S3| SF |S; al S> a) $3]. 


Does the corresponding formula for dimensions of subspaces hold? 


4, Let F be a field which is not of characteristic 2, and let x? + bx + c = 0 be a quadratic 
equation with coefficients in F. Assume that the discriminant b? — 4c is a square in F, 
that is, that there is an element 6 € F such that 6? = b* — 4c. Prove that the quadratic 
formula x = (—b + 6)/2a solves the quadratic equation in F, and that if the discrimi- 
nant is not a square the polynomial has no root in F. 

5. (a) What are the orders of the elements | il 7 ! of GL2(R)? 

(b) Interpret the entries of these matrices as elements of F,;, and compute their orders in 
the group GL2(F-). 

6. Consider the function det: F"“"——> F, where F = F, is a finite field with p elements 
and F”*” is the set of 2 X 2 matrices. 

(a) Show that this map is surjective. 
(b) Prove that all nonzero values of the determinant are taken on the same number of 
times. 

7. Let A be an nXn real matrix. Prove that there is a polynomial f(t) = a-t’ + 
ay—\t"' + ++» + at + ao which has A as root, that is, such that a’A’ + a,—,A’~' + 

- + aA + aol = 0. Do this by showing that the matrices /,A,A’,... are linearly 
dependent. 

*8. An algebraic curve in R° is the locus of zeros of a polynomial f(x, y) in two variables. 
By a polynomial path in R°, we mean a parametrized path x = x(t), y = y(t), where 
x(t), v(t) are polynomials in ¢. 

(a) Prove that every polynomial path lies on a real algebraic curve by showing that, for 
sufficiently large n, the functions x(r)'‘v(r¥, 0 = i, j Sn, are linearly dependent. 

(b) Determine the algebraic curve which is the image of the path x = t? + t, y = rt? ex- 
plicitly, and draw it. 


#3 


Chapter 4 


Linear Transformations 


That confusions of thought and errors of reasoning 
still darken the beginnings of Algebra, 
is the earnest and just complaint of sober and thoughtful men. 


Sir William Rowan Hamilton 


I. THE DIMENSION FORMUTIA 


The analogue for vector spaces of a homomorphism of groups is a map 
Pv — aay 


from one vector space over. a field F to another, which is compatible with addition 
and scalar multiplication: 


Gia) T(v, + v2) = T(v,) + T(v2.) and T(cv) = cT(v), 


for all v,;, v2 in V and all c € F. It is customary to call such a map @ linear transfor- 
mation, rather than a homomorphism. However, use of the word homomorphism 
“would be correct too. Note that a linear transformation is compatible with linear 
combinations: 


2 (> ci = > «TT (vi). 


t 


{ 


This follows from (1.1) by induction. Note also that the first of the conditions of 
(1.1) says that T is a homomorphism of additive groups V’ —> W*. 

We already know one important example of a linear transformation, which is 
in fact the main example: left multiplication by a matrix. Let A be an m X n matrix 
with entries in F, and consider A as an operator on column vectors. It defines a lin- 


ear transformation 
left mult. by A 


109 


[ 
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Indeed. A(X, + X2) = AX, + AX2, and A(cX) = CAX. 
Another example: Let P,, be the vector space of real polynomial functions of 
degree <n, of the form 


[== 
(1.4) Unk Un ike ~ert) v ae 
The ae IS ¢ car iansformation from P,, to P21. 
ad. 
Let 7; V— W be any linear transformation. We introduce two subspaces 
(1.5) ker T = dernel of T = {re € V| T(r) = OF 
im T = image of T = {tw EW Je uw = T(c) forsome c € V}. 


Wed ar EO e 
As one may guess from the similar case of group homomorphisms (Chapter 2. Sec- 


tion 4), ker Tis a subspace of Vand im 7 is a subspace of W. 

It is interesting to INnterps, ct the kernel and image in the case that 7 is left mul- 
tiplication by a matrix A. J ag case the kernel 7 is the set of solutions of the homo- 
gencous linear equation AX . The image of T is the set of vectors B © F” such 
that the linear equation AX = B ieee a solution. — 

The main result of this section is the dimension formula, given in the next 
theorem. =a ar 


VW 


HH) Theorem Let 7: V— -W be a linear transformation, and assume that V is 


finite-dimensional. Then 


dim V = dim(ker 7) + dim(im 7). 
es 


The dimensions of im T and ker T are called thé rank and nullity of T, respec- 


tively. Thus (1.6) reads co lll 
ie dim V = rank + nullity. 
Note the analogy with the formula |G} = |ker g| | im ¢| for homomorphisms of 


pee eine Preeti 


groups [Chapter 2 (6.15)]. 
The rank and nullity of an m X mn matrix A are defined to be the dimensions of 
the image and kernel of left multiplication by A. Let us denote the rank by r and the 
nullity by &. Then & ts the dimension of the space of solutions of the equation 
= (. The vectors B such that the linear equation AX = B has a solution form the 
image. a space whose dimension is r. The sum of these two dimensions ts n. 
Let B be a vector in the image of multiplication by A, so that the equation 
AX = B has at least one solution X¥ = X). Let K denote the space of solutions of the 
homogeneous equation AX = (, the kernel of multiplication by A. Then the set of so- 
lutions of AX = B is the additive coset Xy) + K. This restates a familiar fact: Adding 
any solution of the homogencous equation AX = 0 to a particular solution Xo of the 
inhomogeneous equation AX = 8&, we obtain another solution of the inhomogeneous 
equation. 
Suppose that A is a square n Xn matrix. Ifdet A # 0, then, as we know, the 
system of cquations AX = B has a unique solution for every B, because A is invert- 
= 7 


Wy 
tor pyc d M7 oe x ad NM band 


\ & > ; cf H 7 
_¢ wy ; Va 
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ible. In this case, k = O and r = n’On the other hand, if det A = 0 then the space 
K has dimension k > 0. By the dimension formula, r <n, which implies that the 
image is not the whole space F”. This means that not all equations AX = B have so- 
lutions. But those that do have solutions have more than one, because the set of solu- 
tions of AX = B is a coset of K. 


Proof of Theorem (1.6). Say that dim V = n. Let (u,..., ux) be a basis for the 
subspace ker 7, and extend it to a basis of V [Chapter 3 (3.15) ]: 
(1.8) (ieee Die..., Une); 
Let w; = T(v,) for i = 1,...,n — k. If we prove that (w,..., Wn-x) = S is a basis 
for im 7, then it will on that im T has dimension n — k. This will prove the the- 
orem. 

So we must show that S spans im T and that it is a linearly independent set. Let 
w € im T be arbitrary. Then w = T(v) for some v € V. We write tu in terms of the 
basis (1.8): 

D = Qiu + -+* + age + D0) + >> + Bn—KOn~k, 
and apply 7, noting that T (uj) = 0: 
Vy == 0 4+ vee + 0 + biw, Sa 2 apie 

Thus w is in the span of S, and so S spans im T. 

Next, suppose a linear relation 
(1.9) CW, t °** + Ca—kWn-% = 0 
is given, and consider the linear combination v = c)t, + +++ + Cn—kUn—k, Where v; 
are the vectors (1.8). Applying T to t gives 

T(t) = cywy + oe + Cn—KWn-x = 0. 
Thus t € ker 7. So we may write t in terms of the basis (u,,..., ux) of ker T, say 
>= au, + --> + apu,. Then 
Qua =" + ~ague + Choy + <) t ChkUn-e — OU, 

But (1.8) is a basis. So —a, = 0,..., -ax = 0, and c, = 0,...,cn-« = 0. Therefore 
the relation (1.9) was trivial. This shows that S is linearly independent and com- 
pletes the proof. al 


2. THE MATRIX OF A LINEAR TRANSFORMATION 


ote > 


It is not hard to show that every linear transformation T: F"——> F” is left multipli- 
cation by some m Xn matrix A. To see this, consider the images T(e;) of the stan- 
‘dard basis vectors e; of F”. We label the entries of these vectors as follows: 


ay 


(2.1) hee |: |. 


Bh verb § - Te. a ih AX A ww be liye r 
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and we form the m X n matrix A = (aj) having these vectors as its columns. We can 
write an arbitrary vector X = (x,...,%)' from F" in the form X= 
ex; + -+ + €nXp, putting scalars on the right. Then 


ay Gin 
TH i > T (e,)x; = e xX ptivee Ht « Xe — AX. 
5 c 
am Gmn 


For example, the linear transformation T: R?-——> R? such that 


T(e,) = H and T(e2) = ol 


is left multiplication by the matrix 


Pe [ _ 
Dee |e 
If x = = = (24 ae (25R O25 then 
2 


I 1 b= 15) Xi — *2 
re)= [a+ o= 2 olla = [on 
Using the notation established in Section 4 of Chapter 3, we can make a simi- 
lar computation with an arbitrary linear transformation T: V—— W, once bases of 


the two spaces are given. Let B = (t),...,v,) and C = (wi,..., Wm) be bases of V 
‘and of W, and let us use the shorthand notation 7 (B) to denote the hypervector 


TB) = (Te). Te 


Since the entries of this hypervector are in the vector space W, and since C is a basis 
for that space, there is an m X n matrix A such that 


(22) T(B)=CA or (T(0)),...,7(v,)) = ceners)| A | 


ey 


[Chapter 3 (4.13)]. Remember, this means that for each /, 
(2,3) T (vj) = > WiGij = Widyj + + + WmGmj. 


So A is the matrix whose jth column is the coordinate vector of T(v;). This m Xn 
matrix A = (aj) is called the matrix of T with respect to the bases B,C. Different 
choices of the bases lead to different matrices. 

In the case that V = F”, W = F”, and the two bases are the standard bases, 
A is the matrix constructed as in (2.1). 

The matrix of a linear transformation can be used to compute the coordinates 
of the image vector T (v) in terms of the coordinates of v. To do this, we write v in 
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terms of the basis, say 
OM — BANOS Gets 2 ie 
Then 
T(v) = T(v))x + +++ + T(vn)Xn = T(B)X = CAX. 
Therefore the coordinate vector of T (v) is 
Y = AX, 


meaning that 7(v) = CY. Recapitulating, the matrix A of the linear transformation 
has two dual properties: 


(2.4) T(B)=CA and Y= 


ee 


The relationship between T and A can be explained in terms of the isomor- 
phisms yw: F"——> V and w': F’"——> W determined by the two bases [Chapter 3 
(4.14)]. If we use w and w’ to identify V and W with F" and F”, then T corresponds 
to left multiplication by A. 

T a 


yy BX 


CAX 
] ws 


F" mult by A Xx AY 
Going around this square in the two directions gives the same answer: 
Tow =wW'°A. 

~~~"Thus any linear transformation between finite-dimensional vector spaces’ V and 
W can be identified with matrix multiplication, once bases for the two spaces are 
chosen. But if we study changes of basis in V and W, we can do much better. Let us 
ask how the matrix A. changes when we make other choices of bases for V and W. 

Let B’ = (t;’,...,tn'), C’ = (w’,..., Wm’) be new bases for these spaces. We can 

relate the new basis B’ to the old ee B by a matrix P € GL,(F), as in Chapter 3 

(4.19). Similarly, C’ is related to C by a matrix Q@ © GL,(F). These matrices have 

the following properties: 


(2.6) PX = X' jandeeOY = Y'. 


Here X and xX’ denote the coordinate vectors of a vector v € V with respect to the 
bases B and B’, and similarly Y and Y’ denote the coordinate vectors of a vectr 
w € W with respect to C and C’. 

Let A’ denote the matrix of T with respect to the new bases, defined as abdie 
(2.4), so that A’ xX’ = Y’. Then QAP 'x’ = QAX = QY = Y’. Therefore 


(2.7) A’ = QAP". 


Note that P and Q are arbitrary invertible n Xn and m Xm matrices [Chapter 3 
(4.23)]. Hence we obtain the following description of the matrices of a given linear 
transformation: 
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rigpatt 
(2.8) Proposition. Let A be the matrix of a linear transformation 7 with respect to 
some given bases B,C. The matrices A’ which represent T with respect to other bases 
are those of the form 
A’ = QAP", 


where QO € GL,,(F) and P € GL,(F) are arbitrary invertible matrices. 5 


Now given a linear transformation T: V——> W,, it is natural to look for bases 
B.C of V and W such that the matrix of T becomes especially nice. In fact the matrix 
can be simplified remarkably. 


KE 9) Proposition. 


(a) Vector space form: Let T; V—— W be a linear transformation. Bases B,C can 
be chosen so that the matrix of 7 takes the form 


(2.10) A= 


where /, is the r X r identity matrix, and r_= rank T. 
(b) Matrix form: Given any m Xn matrix A, there are matrices Q © GL,,(F) and 
P € GL, (F) so that QAP'' has the form (2.10). 


It follows from our discussion that these two assertions amount to the same thing. To 
derive (a) from (b), choose arbitrary bases B.C to start with, and let A be the matrix 
of T with respect to these bases. Applying (b), we can find P,Q so that QAP | has the 
required form. Let B' = BP'' and C’ = CQ™' be the new bases, as in Chapter 3 
(4.22). Then the matrix of T with respect to the bases B’,C’ is QAP '. So these new 
bases are the required ones. Conversely, to derive (b) from (a) we view an arbitrary 
matrix A as the matrix of the linear transformation “left multiplication by A”, with 
respect to the standard bases. Then (a) and (2.7) guarantee the existence of P,Q so 
that QAP ' has the required form. 

Note that we can interpret QAP ' as the matrix obtained from A by a succession 
of row and column operations: We write P and Q as products of elementary ma- 
trices: P = Ep::-E, and Q = E,'-:-E,;’ [Chapter 1 (2.18)]. Then QAP"! = 
Eq’ -*: E,'AE, '+-- Ep '. Because of the associative law, it does not matter whether 
the row operations or the column operations are done first. The equation 
(E’A)E = E'(AE) tells us that row operations commute with column operations. 

It is not hard to prove (2.9b) by matrix manipulation, but let us prove (2.9a) 
using bases instead. Let (ui,..., uz) be a basis for ker T. Extend to a basis B for 
V: (v1,..., Or3 Ui,..., Uk), Where r + k = n. Let w; = T(v;). Then, as in the proof of 
(1.6), (wi,..., Wr) is a basis for im 7. Extend to a basis C of W: (wy,..., Wr3X1,...5X:). 
The matrix of T with respect to these bases has the required form. o 
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Proposition (2.9) is the prototype for a number of results which will be proved 
later. It shows the power of working in vector spaces without fixed bases (or coordi- 
nates), because the structure of an arbitrary linear transformation is related to the 
very simple matrix (2.10). It also tells us something remarkable about matrix multi- 
plication, because left multiplication by A on F” is a linear transformation. Namely, 
it says that left multiplication by 4 is the same as left multiplication by a matrix of 
the form (2.10), but with reference to different coordinate systems. Since multiplica- 
tion by the matrix (2.10) is easy to describe. we have learned something new. 


3. LINEAR OPERATORS AND EIGENVECTORS 


Let us now consider the case of a linear transformation 7; V-—— V of a vector space 
to itself. Such a linear transformation is called a linear operator on V. Left multipli- 
cation by an # X nv matrix with entries in F defines a linear operator on the space F” 
of column vectors. 

For example, a rotation pe of the plane through an angle @ is a linear operator 
on R*, whose matrix with respect to the standard basis is 


cos 6 —sin 6 
(3.1) re}. 
sin @ cos 6 
To verify that this matrix represents a rotation, we write a vector X © R* in polar 
! : ; r cos @ 
coordinates, as X = (r,q@). Then in rectangular coordinates, X = penal The 


r cos(a + @) 
r sin(a + @) 
coordinates, RX = (r,a@ + 6). This shows that RX is obtained from X by rotation 
through the angle 6. 

The discussion of the previous section must be changed slightly when we are 
dealing with linear operators. It is clear that we want to pick only one basis 


addition formulas for sine and cosine show that RX = | So in polar 


BRE. .-+ t,) for V, and use it in place of both of the bases B and C considered in 
Section 2. In other words, we want to write 

(5.2) TAB) =aBA 

or 


es 
T(v;) = Yviay = viay + + + Onan. 
t 


This defines the matrix A = (aj) of T. It is a square matrix whose jth column is the 
coordinate vector of 7 (t;) with respect to the basis B. Formula (2.4) is unchanged, 
provided that W and € are replaced by V and B. As in the previous section, if X and 
Y denote the coordinate vectors of v and 7(tv) respectively, then 


(3.3) Yael 
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The new feature arises when we study the effect of a change of basis on V. 


Suppose that B is replaced by a new basis B’ = (t,’,..., Un’). Then formula (2.7) 
shows that the new matrix A’ has the form 
(3.4) A' = PAP "|, 


where P is the matrix of change of basis. Thus the rule for change of basis in a linear 
transformation gets replaced by the following rule: 


(3.5) Proposition. Let A be the matrix of a linear operator T with respect to a ba- 
sis B. The matrices A’ which represent 7 for different-bases are those of the form 


A’ = PAP", 
for arbitrary P € GL,(F). 0 


In general, we say that a square matrix A is similar to A’ if A’ = PAP™' for 
some P € GL,(F). We could also use the word conjugate [see Chapter 2 (3.4)]. 

Now given A, it is natural to ask for a similar matrix A’ which is particularly 
simple. One may hope to get a result somewhat like (2.10). But-here our allowable 
change is much more restricted, because we have only one basis, and therefore one 
matrix P, to work with. 

We can get some insight into the en ee writing the hypothetical matrix 2 
as a product of elementary matrices: P = E,-::E,. Then 


PAPO = Ep ERAR) ep 


In terms of elementary operations, we are allowed to change A by a sequence of 
steps A~~~ EAE |. In other words, we may perform an arbitrary row operation E, 
but then we must also make the inverse column operation E~'. Unfortunately, the 
row and column operations interfere with each other, and this makes the direct anal- 
ysis of such operations confusing. I don’t know how to use them. It is remarkable 
that a great deal can be done by another method. 

The main tools for analyzing linear operators are the concepts of eigenvector 
and invariant subspace. 

Let 7: V—— V be-a linear operator on a vector space. A subspace W of V is 
called an invariant ut subspace or a T-invariant subspace if it is carried to itself by the 
operator: ~~~ 


(3.6) TW CW. 


In other words, W is T-invariant if T(w) © W for all w © W. When this is so, T 
defines a linear operator on W, called the restriction of T to W. 

Let W be a T-invariant subspace, and let us choose a basis B of V by appending 
some vectors to a basis (w),..., we) of W: 


B = (Wi,...5 Wks Ol ys005 Un—k)- 


Then the fact that W is invariant can be read off from the matrix M of T. For, the 
columns of this matrix are the coordinate vectors of the image vectors [see (2.3)], 
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and T (w;) is in the subspace W, so it is a linear combination of the basis (w1,..., wx). 
So when we write T(w;) in terms of the basis B, the coefficients of the vectors 
U1,--., Un—k are zero. It follows that M has the block form 


(3.7) M = |. I 


0 D 


where A is a k X k matrix. Moreover, A is the matrix of the restriction of T to W. 

Suppose that V = W, @ Wy is the direct sum of two T-invariant subspaces, and 
let B; be a basis of W;. Then we can make a basis B of V by listing the elements of B, 
and B; in succession [Chapter 3 (6.6a)]. In this case the matrix of T will have the 
block diagonal form 


(3.8) We [ | 


0 A 


where A; is the matrix of T restricted to Wi. 
The concept of an eigenvector is closely related to that of an invariant sub- 
space. An eigenvector v for a linear operator T is a nonzero vector such that 


(3.9) T (v) = cv ,VvVtO 


for some scalar c € F. Here c is allowed to take the value O, but the vector v can 
not be zero. Geometrically, if V = R", an eigenvector is a nonzero vector v such 

that v and T (v) are parallel. a : 

e scalar c appearing in (3.9) is called the eigenvalue associated to the eigen- 

* vector v. When we speak of an eigenvalue of a ‘linear operator T, we mean a scalar 

c € F which is the eigenvalue associated to some eigenvector. ——— 

"For example, the standard basis vector e; is an eigenvector for left multiplica- 


. 


The eigenvalue associated to the eigenvector e, is 3. Or, the véctor (0,1,1)' is an ei- 
genvector for multiplication by the matrix 


1 dao 
Awe 2 1 1 
oo 2 


on the space R° of column vectors, and its eigenvalue is 2. 

Sometimes eigenvectors and eigenvalues are called characteristic vectors and 
characteristic values. 

Let v be an eigenvector for a linear operator 7. The subspace W spanned by t is 
T-invariant, because T(av) = acv © W for alla € F. Conversely, if this subspace 
is invariant, then v is an eigenvector. So an eigenvector can be described as a basis 
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of a one-dimensional T-invariant subspace. If v is an eigenvector, and if we extend it 
to a’basis (vu = v),..., Un) of V, then the matrix of T will have the block form 


where c is the eigenvalue associated to v;. This is the block decomposition (3.7) in 
the case of an invariant subspace of dimension |. 

When we. speak of an eigenvector for an n X n matrix A, we mean a vector 
which is an eigenvector for left multiplication by . A, a nonzero column vector such 
that 


AX = cX, ) lor some csGrhe Semen 


As before, the scalar c is called an eigenvalue. Suppose that A is the matrix of T with 
respect to a basis B, and let X denote the coordinate vector of a vector vu € V. Then 
7 (vu) has coordinates AX (2.4). Hence X is an eigenvector for A if and only if v is an 
eigenvector for 7. Moreover, if so, then the eigenvalues are the same: T and A have 
the same eigenvalues. 


(3.10) Corollary. ‘Similar matrices have the same eigenvalues. 


This follows from the fact (3.5) that similar matrices represent the same linear trans- 
formation. c 


Eigenvectors aren't always easy to find, but it is easy to tell whether or not a 
given vector X is an eigenvector for a matrix A. We need only check whether or not 
AX is a multiple of X. So we can tell whether or not a given vector v is an eigenvec- 
tor for a linear operator 7, provided that the coordinate vector of v and the matrix of 
T with respect to a basis are known. If we do this for one of the basis vectors, we 
find the following criterion: 


_ CANO) The basis vector v; is an eigenvector of T, with eigenvalue c, 
if and only if the jth column of A has the form ce;. 

For the matrix A is defined by the property T(vj) = tiay + ++ + Daan. So if 

T (vj) = cvj, then a, = ¢ and a, = Oif i + j. = 


(3.12) Corollary. With the above notation, A is a diagonal matrix if and only if 
every basis vector v, is an eigenvector. a 


\A3.13) Corollary. The matrix A of a linear transtormation is similar to a diagonal 


matrix_if and uy if there is a basis B’ = (v,’...., Un’) of V made up of eigenvec- 
tors. a 


— 
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This last corollary shows that we can represent a linear operator very simply 
by a diagonal matrix, provided that it has enough eigenvectors. We will see in Sec- 
tion 4 that every linear operator on a complex vector space has at least one eigenvec- 
tor, and in Section 6 that in most cases the eigenvectors form a basis. But a linear 
operator on a real vector space needn’t have an eigenvector. For example, the rota- 
tion pe (3.1) of the plane does not carry any vector to a parallel one, unless @ = 0 or 
a. SO pe has no eigenvector unless 6 = 0 or 77. 

The situation is quite different for real matrices having positive entries. Such 
matrices are sometimes called positive _ matrices. They occur often in applications, 
and one of their most important properties is that they always have an eigenvector 
whose coordinates are positive (a positive eigenvector). Instead of proving this fact, 
let us illustrate it in the case of two variables by examining the effect of multiplica- 
tion by a positive 2 X 2 matrix A on R?. 

Let w; = Ae;. The parallelogram law for vector addition shows that A sends the 
first quadrant § to the sector bounded by the vectors w,, w.. And the coordinate vec- 
tor of w, is the 7th column of A. Since the entries of A are positive, the vectors w, lie 
in the first quadrant. So A carries the first quadrant to itself: S D AS. Applying A 
again, we find AS D A’S, and so on: 


(3.14) SP SIAS DAS SPAS De. 


as illustrated below in Figure (3.15) for the matrix A = F Al 


1 4 


(3.15) Figure. Images of the first quadrant under repeated multi- 
plication by a positive matrix. 


Now the intersection of a nested set of sectors is either a sector or a half line. 
In our case, the intersection Z = MA’S turns out to be a half line. This is intuitively 
nn 
plausible, and it can be shown in various ways. The proof is left as an exercise. We 
multiply the relation Z = MA’S on both sides by A: 


AZ = (9 = Sa Se 
0 1 


Hence Z = AZ. This shows that the nonzero vectors in Z are ‘eigenvectors. O 


ne —_ 


~_" 


und 
Chardon 
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4, THE CHARACTERISTIC POLYNOMIAL 


goes 


In this section we determine the eigenvectors of an arbitrary linear operator 7. Re- 
call that an eigenvector for T is a nonzero vector v such that 


(4.1) T (0) = cv, 


for some c in F. At first glance, it seems difficult to find eigenvectors if the matrix of 

the linear operator is complicated. The trick is to solve a different problem, namely 

to determine the eigenvalues first. Once an eigenvalue c is determined, equation 

(4.1) becomes linear in the coordinates of v, and solving it presents no problem. 
We begin by writing (4.1) in the form 


(4.2) [2 = citer; 
where / stands for the identity operator and T — c/ is the linear operator defined by 
(4.3) [T — cl\(v) = T(v) — ev. 


It is easy to check that T — c/ is indeed a linear operator. If A is the matrix of T with 
respect to some basis, then the matrix of T — cl is A — cl. 
We can restate (4.2) as follows: 


AB) v is in the kernel of T — cl. 


a) Lemma. The following conditions on a linear operator T; V——>V on a 


finite-dimensional vector space are equivalent: 


Cay ker? > 0. 

(b) imT < V. 

(c) If A is the matrix of the operator with respect to an arbitrary basis, then 
det A-= 0. 

(d) 0 is an eigenvalue of 7. 


Proof. The dimension formula (1.6) shows that kerT > 0 if and only if 
im T < V. This is true if and only if T is not an isomorphism, or, equivalently, if 
and only if A is not an invertible matrix. And we know that the square matrices A 
which are not invertible are those with determinant zero. This shows the equiva- 
lence of (a), (b), and (c). Finally, the nonzero vectors in the kernel of T are the ei- 
genvectors with eigenvalue zero. Hence (a) is equivalent to (d). o 


The conditions (4.5a) and (4.5b) are not equivalent for infinite-dimensional 
vector spaces. For example, let V = R* be the space of infinite row vectors 
(a;,az,...), as in Section 5 of Chapter 3. The shift operator, defined by 


(4.6) T (a;, Q2,...) = (Oxa),02,<.), 


is a linear operator on V. For this operator, ker F = 0 but im T < V. 
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(4.7) Definition. A linear operator T on a finite-dimensional vector space V is 
called singular if it satisfies any of the equivalent conditions of (4.5). Otherwise, T 
is nonsingular. 


~ 


We know that c is an eigenvalue for the operator T if and only if T — cl has a 
nonzero kernel (4.4). So, if we replace T by T — c/ in the lemma above, we find: 
4®) Corollary. The eigenvalues of a linear operator T are the scalars c © F such 
that T — c/ is singular. o 


If A is the matrix of T with respect to some basis, then the matrix of T — cl is 
A — cl. SoT — cf is singular if and d only if det (A — cl) = 0. This determinant can 
be computed explicitly, and“doing so provides us with a concrete method for deter- 
mining the eigenvalues and eigenvectors. 

Suppose for example that A is the matrix 


(4.9) F | 


whose action on R? is illustrated in Figure (3.15). Then 
a-o=[? 3-[¢ Ol [Pos a 
rn tee ie oe 


det(A — cl) = c? — 7c + 10 = (c — 5)(c — 2). 


This determinant vanishes if c = 5 or 2, so we have shown that the eigenvalues of A 
are 5 and 2. To find the eigenvectors, we solve the two systems of linear equations 
[A — 51]x = O and [A — 21/]x = 0. The solutions are unique up to scalar factor: 


(4.10) v= lik = [lk 


Note that the eigenvector v, with eigenvalue 5 is in the first quadrant. It lies | on the 
half line Z which is illustrated in Figure (3.15). . 
“We Tiow make the same computation with an arbitrary matrix. It is convenient 
to change sign. Obviously det(c/ — A) = 0 if and only if det(A — c/) = 0. Also, it 
is customary to replace the symbol c by a variable t. We form the matrix 17 — A: 


and 


G—a) a2 sea ~Gin 
(4.11) |: iii 
=) 7 | 


Then the complete expansion of the determinant [Chapter | (4.11)] shows that 
det(t7 — A) is a polynomial of degree nin t, whose coefficients are scalars. 


A A IR Ne SN Tt 
ree 
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(4.12) Definition. The characteristic ae of a linear operator T is the poly- 
nomial a ane ree 


p(t) = det(t/ — A), 


where A is the matrix of 7 with respect to some basis. 


The eigenvalues of T are determined by combining (4.8) and (4.12): c is an ei- 
genvalue if and only if plc) = = 0. a 


— mene ie NPONTI Ge per annn.ehniineeninnasaatentaiit 


eB) Corollary. The eigenvalues of a linear operator are the roots of its charac- 
teristic polynomial. u — 


_A4.14) Corollary. The eigenvalues of an upper or lower triangular matrix are its 
diagonal entries. 


Proof. If A is an upper triangular matrix, then so is t/ — A. The determinant 
of a triangular matrix is the product of its diagonal entries, and the diagonal entries 
of t!—A are t-— aj. Therefore the characteristic polynomial is p(t) = 
(t — au)(t — ax)--*(t — ana), ands roots, thescigenvaltics, are aj\....,4@in0 0 


We can compute the characteristic polynomial of an arbitrary 2 X 2 matrix 
ee \e b 
c d 


(4.15)  det(t7, — A) = det] eal = 1? — (a + d)t + (ad — be). 


without difficulty. It is 


id 
The discriminant of this polynomial is 
(4.16) (a + d)? — 4(ad — bc) = (a — d)? + 4bc. 


If the entries of A_ are Positive real numbers, then the discriminant is also positive, 
and therefore the characteristic polynomial has real roots, as predicted at the end of 
Section 3. 


(4.17) Proposition. The characteristic polynomial of an operator T does not de- 
pend on the choice of a basis. 


Proof. A second basis leads to a matrix A’ = PAP™' [see (3.4)]. We have 


tl — A’ = tl — PAP"' = P(ti)p"' — PAP"! = P(ti — A)P™'.” 
Thus 


det(t7 — A’) = det(P(t — A)P™') = det P det(t? — A)det P™' = det(t7 — A). 


So the characteristic polynomials computed with A and A’ are equal, as was as- 
serted. o 
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#3) Proposition. The characteristic polynomial p(t) has the form 


p(t) = t" — (tr A)t” "| + (intermediate terms) + (—1)"(det A), 
where tr A, the trace of A, is the sum of the diagonal entries: a 


en esata, 


tieds Sea) “age oF Aaa. 


All coefficients are independent of the basis. For instance tr PAP"! = tr A. 


——————— 


This is proved by computation. The independence of the basis follows from (4.17). 0 


Since the characteristic polynomial, the trace, and the determinant are inde- 
pendent of the basis, they depend only on the operator T. So we may define the 
terms characteristic polynomial, trace, and determinant of a linear operator T to be 
those obtained using the matrix of T with respect to an arbitrary basis. 


(4.19) Proposition. Let 7 be a linear operator on a finite-dimensional vector space V. 


(a) If V has dimension n, then T has at most n eigenvalues. 


(b) If F is the field of complex numbers and V # 0, then T has at least one eigen- 
value, and hence it has an eigenvector. 


Proof. 


(a) A polynomial of degree n can have at most n different roots. This is true for 
any field F, though we have not proved it yet [see Chapter 11, (1.8)]. So we 
can apply (4.13). 

Every polynomial of positive degree with complex coefficients has at least one 
complex root. This fact is called the Fundamental Theorem of Algebra. There 
is a proof in Chapter 13 (9.1). a 


(b 


— 


For example, let A be the rotation (3.1) of the real plane R? by an angle 8. Its 
characteristic polynomial is 


(4.20) p(t) = t? — (2 cos @)t + 1, 


which has no real root unless cos @ = +1. But if we view A as an operator on C’, 
there are two complex eigenvalues. 


5. ORTHOGONAL MATRICES AND ROTATIONS 
In this section we describe the rotations of two- and three-dimensional spaces R? 


and R? about the origin as linear onerators. We have already noted (3.1) that a rota- 
tion of R* through an angle 6 is represented as multiplication by the matrix 


Bs @ —sin “ 
sin@ cos 6} 
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A rotation of IR? about the origin can be described by a pair (v, 8) consisting of a unit 
vector v, a vector of length 1, which lies in the axis of rotation, and a nonzero angle 
6, the angle of rotation. The two pairs (v,6) and (~v, —@) represent the same rota- 
tion. We also consider the identity map to be a rotation, though its axis is indetermi- 
nate. 


(e) Me 


(5.1) Figure. 


The matrix representing a rotation through the angle @ about the vector e, is 
obtained easily from the 2 < 2 rotation matrix. It is 


1 0 0 
62) A=10 cos@ -sin 6}. 
0 sin@ cosé 


Multiplication by A fixes the first coordinate x, of a vector and operates by rotation 
on (x2, x3)‘. All rotations of R° are linear operators, but their matrices can be fairly 
complicated. The object of this section is to describe these rotation matrices. 

A real n Xn matrix A is called orthogonal if At = A'', or, equivalently, if 
A'A = 1. The orthogonal n X n matrices form a subgroup of G Chix (RX) denoted by O, 
“and called the orthogonal group: ee ° 


5) (= =a E GL,(R) | ata = J}. 


The determinant of an orthogonal matrix is 2 +1], because if A'A = /, then 


ener is etn tn a ene 


(det A)? = (det A‘ (det A) = 


The orthogonal matrices having determinant +1 form a subgroup called the special 
orthogonal group and denoted by SO,: ai = 


(5.4) SO, = {A © GL,(R) | A'A = 7, det A = 1}. 


This subgroup has one coset in addition to SO,, namely the set of elements with de- 
terminant —1. So it has index 2 in O,. 
The main fact which we will prove about rotations is stated below: 


(5.5) Theorem. The rotations of R* or R* about the origin are the linear operators 
whose matrices with respect to the standard basis are orthogonal and have determi- 
nant |. In other words, a matrix A represents a rotation of R? (or R°*) if and only if 
A € SO) (or SOs). 


Note the following corollary: 
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(5.6) Corollary. The composition of two rotations of R* about the origin is also a 
rotation. 


This corollary follows from the theorem because the matrix representing the compo- 
sition of two linear operators is the product matrix, and because SO;, being a sub- 
group of GL;(R), is closed under products. It is far from obvious geometrically. 
Clearly, the composition of two rotations about the same axis is also a rotation about 
that axis. But imagine composing rotations about different axes. What is the axis of 
rotation of the composed operator? 

Because their elements represent rotations, the groups SO, and SO; are called 
the two- and three-dimensional rotation groups. Things become more complicated in 
dimension > 3. For example, the matrix 


cos 6 ~ sin 8 

sin@ cos 0 
cos n — sin 7 
sin cos 7 


(7) 


is an element of SO,. Left multiplication by this matrix is the composition of a rota- 
tion through the angle @ on the first two coordinates and a rotation through the angle 
7 on the last two. Such an operation can not be realized as a single rotation. 

The proof of Theorem (5.5) is not very difficult, but it would be clumsy if we 
did not first introduce some terminology. So we will defer the proof to the end of 
the section. 

To understand the relationship between orthogonal matrices and rotations, we 
will need the dot product of vectors. By definition, the dot product of column vec- 
tors X and Y is 


(5.8) (XV a es a Va 
It is sometimes useful to write the dot product in matrix form as 
(5.9) (X- Y) = X'y. 


There are two main properties of the dot product of vectors in R* and R*. The 
first is that (X - X) is the square of the length of the vector: 


Wiles Xs OL ky Xara, 


according to the case. This property, which follows from Pythagoras’s theorem, is 
the basis for the definition of length of vectors in R": The length of X is defined by 


the formula 
(5.10) |x 


The distance between two vectors X,Y is defined to be the length |x — Y| of x — Y. 
The second important property of dot product in R* and R® is the formula 


< = (X 3 xa) = Sar oF eS =P ie. 


(rie (x: Y) =|X||¥| cos @, 
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where @ is the angle between the vectors. This formula is a consequence of the law 
of cosines 


c? = a? + b? — 2ab cos 6 


for the side lengths a, b,c of a triangle, where 6 is the angle subtended by the sides 
a,b. To derive (5.11), we apply the law of cosines to the triangle with vertices 
0, X,Y. Its side lengths are |X|, | Y| and |X — Y]|, so the law of cosines can be written 
as 


(X —Y-+-X — Y) = (K- xX) 4+(¥- VY) = 2/x1 |¥| cos 8. 
The left side expands to 
Pet) AD. Geert) em 1,6 etd em) eS a 


and formula (5.11) is obtained by comparing terms. 

The most important application of (5.11) is that two vectors X and Y are or- 
thogonal, meaning that the angle @ is 7/2, if and only if (x - Y) = O. This property 
is in as the definition of orthogonality of vectors in R”: 


512) X is orthogonal to Y if (X + Y) = 0. 


513) Proposition. The following conditions on a real 1 X mn matrix A are equiva- 
lent: 


(a) A is orthogonal. 
(b) Multiplication by A preserves dot product, that is, (AX - AY) = (X - ¥) for all 
[ column vectors X, Y. ole 


J 


~"(c) The columns of A are mutually orthogonal unit vectors. 
om 
A basis consisting of mutually orthogonal unit vectors is called an orthonormal 
basis. An orthogonal matrix 1s one whose columns form an orthonormal basis. 
“Left multiplication by an orthogonal matrix is also called an orthogonal opera- 
tor. Thus the orthogonal operators on IX” are the ones which preserve dot product. 


Proof of Proposition (5.13). We write (xX -¥) = X'y. If A is orthogonal, then 
A‘A = 1. SO 
(X + ¥) = X'Y = X'ATAY = (AX)NAY) = (AX © AY). 


Conversely, suppose that X'Y = X‘A'AY for all X and Y. We rewrite this equality as 
X'BY = 0, where B = / — A'A. For any matrix B, 


(5.14) e Be; = by. 


So if X'BY = 0 tor all X.Y, then e; Be; = bj = 0 for all i,j. and B = 0. Therefore 
| = A‘A. This proves the equivalence of (a) and (b). To prove that (a) and (c) are 
equivalent, let Aj denote the jth column of the matrix A. The (/.j) entry of the 
product matrix A‘A is (A; - Aj). Thus A‘A = / if and only if (A; - Aj) = 1 for all i. 
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and (A, - Aj) = 0 for all i # j, which is to say that the columns have length | and are 
orthogonal. o 


The geometric meaning of multiplication by an orthogonal matrix can be ex- 
plained in terms of rigid motions. A rigid motion or isometry of R" is a map 
m: R"—— R” which is distance preserving; that is, it is a map satisfying the follow- 
ing condition: If X,Y are points of R”, then the distance from X to Y is equal to the 
distance from m(X) to m(Y): 


(5:15) |m(X) — m(y)| = |x — Y|. 


Such a rigid motion carries a triangle to a congruent triangle, and therefore it pre- 
serves angles and shapes in general. 

Note that the composition of two rigid motions is a rigid motion, and that the 
inverse of a rigid motion is a rigid motion. Therefore the rigid motions of R” form a 
group M,. with composition of operations as its law of composition. This group is 
called the group of motions. 


(5.16) Proposition. Let m be a map R’—— R”. The following conditions on m 
are equivalent: 


(a) m is a rigid motion which fixes the origin. 
(b) m preserves dot product: that is, for all ¥.y © R”, (m(X) » m(Y)) = (X + Y). 
(c) m is left multiplication by an orthogonal matrix. 


(5.17) Corollary. A rigid motion which fixes the origin is a linear operator. 
This follows from the equivalence of (a) and (c). 


Proof of Proposition (5.16). We will use the shorthand ‘ to denote the map m, writ- 
ing m(X) = X’. Suppose that m is a rigid motion fixing 0. With the shorthand nota- 
tion, the statement (5.15) that m preserves distance reads 


(5.18) (x' Sey eX — y') ey «XE 


for all vectors X,Y. Setting Y = 0 shows that (X’ - X') = (X - X) for all X. We ex- 
pand both sides of (5.18) and cancel (4 - X) and (Y - ¥), obtaining (X’ - Y’) = 
(x - Y). This shows that m preserves dot product, hence that (a) implies (b). 

To prove that (b) implies (c), we note that the only map which preseryes dot 
product and which also fixes each of the basis vectors e; is the identity. For, if m 
preserves dot product, then (X - e;) = (X’ - e;') for any X. If e;’ = ej as well, then 


X, = (X + e,) = (X' + e') = X - ej) = x/ 


for all j. Hence X = X’, and m is the identity. 

Now suppose that m preserves dot product. Then the images e,’,..., en’ of the 
standard basis vectors are orthonormal: (e,’ - e;') = 1 and (e;’ - e’) = 0 if i # jj. 
Let B’ = (e,',..., én’), and let A = [B’]. According to Proposition (5.13), A is an or- 
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thogonal matrix. Since the orthogonal matrices form a group, A | 1s also orthogonal. 
This being so, multiplication by A | preserves dot product too. So the composed mo- 
tion A 'm preserves dot product, and it fixes each of the basis vectors e;. Therefore 
A 'm is the identity map. This shows that m is left multiplication by A, as required. 

Finally, if m is a linear operator whose matrix A is orthogonal, then 
x’ — y’ = (x — Y)' because m is linear, and |x’ — y’| = |(x — Y)'| = |X — Y| by 
(5.13b). So m is a rigid motion. Since a linear operator also fixes 0, this shows that 
(c) implies (a). o 


One class of rigid motions which do not fix the origin, and which are therefore 
not linear operators. is the translations. Given any fixed vector b.= (b,,...,bn)' in 
IX”, translation by b is the map 


X| + b, 
(S99) iniX) 2X + b=); 
Xi On 


This map is a rigid motion because t(X) — t(Y) = (X + b) -— (Y+ b)=xX— Y, 
and hence | t,(x) — #,(Y)| = |x — Y|. 


(5.20) Proposition. Every rigid motion m is the composition of an orthogonal lin- 
ear operator and a translation. In other words, it has the form m(X) = AX + b for 
some orthogonal matrix A and some vector b. 


Proof. Let b = m(Q). Then t-,(b) = 0, so the composed operation t-,m is a 
rigid motion which fixes the origin: ¢ ,(m(0)) = 0. According to Proposition (5.16). 
t »m is left multiplication by an orthogonal matrix A: t »m(X) = AX. Applying fp to 
both sides of this equation, we find m(xX) = Ax + b. 

Note that both the vector b and the matrix A are uniquely determined by m, be- 
cause b = m(O) and A is the operator t-,m. o 


Recall that the determinant of an orthogonal matrix is +1. An orthogonal op- 
erator is called orientation-preserving if its determinant is +1, and orientation- 
reversing if its determinant is ~1. Similarly, let m be a rigid motion. We write 
m(X) = AX + bas above. Then m is called orientation-preserving if det A = 1, and 
orientation-reversing if det A = —1. A motion of R? is orientation-reversing if it 
flips the plane over, and orientation-preserving if it does not. 

Combining Theorem (5.5) with Proposition (5.16) gives us the following char- 
acterization of rotations: 


(5.21) Corollary. The rotations of R’ and R® are the orientation-preserving rigid 
motions which fix the origin. o 


We now proceed to the proof of Theorem (5.5), which characterizes the rota- 
tions of R* and R? about the origin. Every rotation p is a rigid motion, so Proposi- 
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tion (5.16) tells us that p is multiplication by an orthogonal matrix A. Also, the de- 
terminant of A is 1. This is because det A = +1 for any orthogonal matrix, and 
because the determinant varies continuously with the angle of rotation. When the 
angle is zero, A is the identity matrix, which has determinant 1. Thus the matrix of a 
rotation is an element of SO2 or SO3. 

Conversely. let A € SO2 be an orthogonal 2 X 2 matrix of determinant 1. Let 
v; denote the first column Ae, of A. Since A is orthogonal, v; is a unit vector. There 
is a rotation R (3.1) such that Re; = v, too. Then B = R''A fixes e;. Also, A and R 
are elements of SO, and this implies that B is in SO,. So the columns of B form an 
orthonormal basis of R’, and the first column is e,. Being of length 1 and orthogo- 
nal to e;, the second column must be either e: or —e2, and the second case is ruled 
out by the fact that det B = 1. It follows that B = 7 and that A= R. So A is a 
rotation. 

To prove that an element A of SO; represents a rotation, we’d better decide on 
a definition of a rotation p of R® about the origin. We will require the following: 


(5.22) 


(i) p is a rigid motion which fixes the origin; 
(ii) p also fixes a nonzero vector v; 
(iii) p operates as a rotation on the plane P orthogonal to v. 


According to Proposition (5.16), the first condition is equivalent to saying that p is 
an orthogonal operator. So our matrix A € SO; satisfies this condition. Condition 
(ii) can be stated by saying that v is an eigenvector for the operator p, with eigen- 
value 1. Then since p preserves orthogonality, it sends the orthogonal space P to it- 
self. In other words, P is an invariant subspace. Condition (iii) says that the restric- 
tion of p to this invariant subspace is a rotation. 

Notice that the matrix (5.2) does satisfy these conditions, with v = e). 


(5.23) Lemma. Every element A € SO; has the eigenvalue 1. 


Proof. We will show that det(A — /) = 0. This will prove the lemma [see 
(4.8)]. This proof is tricky, but efficient. Recall that det A = det A‘ for any matrix A, 
so det A' = 1. Since A is orthogonal, A'(A — 1) = (J — A)'. Then 


det(A — /) = det A'(A — 1) = det(/ — A)' = det(/ — A). 


On the other hand, for any 3X3 matrix B, det(-B) = -det B. Therefore 
det(A — 7) = —det(J — A), and it follows that det(A — 7) = 0.0 


Now given a matrix A € SO3, the lemma shows that left multiplication by A 
fixes a nonzero vector v,. We normalize its length to 1, and we choose orthogonal 
unit vectors v2, v3 lying in the plane P orthogonal to v;. Then B = (v;, v2, v3) is an 
orthonormal basis of R?. The matrix P = [B]”’ is orthogonal because [B] is orthogo- 
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nal, and A’ = PAP™' represents the same operator as A does, with respect to the basis 

B. Since A and P are orthogonal, so is A’. Also det A’ = det A = 1. So A’ € SO. 
Since v; is an eigenvector with eigenvalue |, the first column of A’ is e,. Since 

A’ is orthogonal, the other columns are orthogonal to e,, and A’ has the block form 


ig 
O| R} 
Using the fact that A’ © SO3, one finds that R © SO2. So R is a rotation. This shows 


that A’ has the form (5.2) and that it represents a rotation. Hence A does too. This 
completes the proof of Theorem (5.5). 


(5.24) Note. To keep the new basis separate from the old basis, we denoted it by B’ 
in Chapter 3. The prime is not needed when the old basis is the standard basis, and 
since it clutters the notation, we will often drop it, as we did here. 


6. DIAGONALIZATION 


In this section we show that for “most” linear operators on a complex vector space, 
there is a basis such that the matrix of the operator is diagonal. The key fact, which 
we already noted at the end of Section 4, is that every complex polynomial of posi- 
tive degree has a root. This tells us that every linear operator has an eigenvector. 


(6.1) Proposition. 


(a) Vector space form: Let T be a linear operator on a finite-dimensional complex 
vector space V. There is a basis B of V such that the matrix A of T is upper tri- 
angular. 

(b) Matrix form: Every complex n X n matrix A is similar to an upper triangular 
matrix. In other words, there is a matrix P € GL,(C) such that PAP ' is upper 
triangular. 


Proof. The two assertions are equivalent, because of (3.5). We begin by ap- 
plying (4.19b), which shows the existence of an eigenvector, call it v;'. Extend to a 
basis B’ = (v,’,..., Un’) for V. Then by (3.11), the first column of the matrix A’ of T 
with respect to B’ will be (c:,0,...,0)', where c; is the eigenvalue of v,'’. Therefore 
A’ has the form 
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where B is an (n — 1) X (n — IL) matrix. The matrix version of this reduction is this: 
Given any n Xn matrix A, there is a P © GL,(C) such that A’ = PAP™' has the 
above form. Now apply induction on n. By induction, we may assume that the exis- 
tence of some Q € GL,-,(C) such that QBQ ' is triangular has been proved. Let Q, 
be the n X n matrix 


Then 
(O:P)A(Q:P)' = Q,(PAP™')Q,"' = Q,A’Q,' 


has the form 


which is triangular. o 


As we mentioned, the important point in the proof is that every complex poly- 
nomial has a root. The same proof will work for any field F, provided that all the 
roots of the characteristic polynomial are in the field. 


(6.2) Corollary. Let F be a field. 


(a) Vector space form: Let T be a linear operator on a finite-dimensional vector 
space V over F, and suppose that the characteristic polynomial of T factors into 
linear factors in the field F. Then there is a basis B of V such that the matrix A 
of T is triangular. 

(b) Matrix form: Let A be an n X n matrix whose characteristic polynomial factors 
into linear factors in the field F. There is a matrix P € GL,(F) such that PAP”! 
is triangular. 


Proof. The proof is the same, except that to make the induction step one has 
to check that the characteristic polynomial of the matrix B is p(t)/(t — c:), where 
p(t) is the characteristic polynomial of A. This is true because p(t) is also the charac- 
teristic polynomial of A’ (4.17), and because det (t/ — A’) = (t — ci) det(t! — B). 
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So our hypothesis that the characteristic polynomial factors into linear factors carries 
over from A to B. o 


Let us now ask which matrices A are similar to diagonal matrices. As we saw 
in (3.12), these are the matrices A which have a basis of eigenvectors. Suppose 
again that F = C, and look at the roots of the characteristic polynomial p(t). Each 
root is the eigenvalue associated to some eigenvector, and an eigenvector has only 
one eigenvalue. Most complex polynomials of degree n have n distinct roots. So 
most complex matrices have n eigenvectors with different eigenvalues, and it is rea- 
sonable to suppose that these eigenvectors may form a basis. This is true. 


we ~ 


ra we : : 
\ 46:3) Proposition. Let v,,...,v, © V be eigenvectors for a linear operator 7, with 
distinct eigenvalues c,,...,c,. Then the set (v,,..., v,) is linearly independent. 


Proof. Induction on r: Suppose that a dependence relation 
0 = av, + + + av, 
is given. We must show that a; = 0 for all i, and to do so we apply the operator T: 
0 = T(O) = a,T(v)) + --- + aT (vr) = ayeiv, + -- + a@-c,t,. 


This is a second dependence relation among (v),...,v,). We eliminate v, from the 
two relations, multiplying the first relation by c, and subtracting the second: 


eran eure ot a iler — C1 oreT. 


Applying the principle of induction, we assume that (v,,..., u--;) are independent. 
Then the coefficients a,(c; — c,),...,@r—1(Cr — C,_,) are all zero. Since the c;’s are 
distinct, c- — ci # Oif i <r. Thus a, = ... = a--, = 0, and the original relation 


is reduced to 0 = a,v,. Since an eigenvector can not be zero, a, = 0 too. 5 


= ee = me 


The next theorem follows by combining (3.12) and (6.3): QnOiar M aly 
“” ; 
\{64) Theorem. Let T be a linear operator on a vector space V of dintension n over 
a field F. Assume that its characteristic polynomial has n distinct rovts in F. Then 
-\there is a basis for V with respect to which the matrix of T is diagonal. o 


- a” \ (a i.e NC Es | 
Cabo” \ yl Note that the diagonal entries are determined, except for their order, by the 
. linear operator T. They are the eigenvalues. 
— When p(t) has multiple roots, there is usually no basis of eigenvectors, and it 
- is harder to find a nice matrix for T. The study of this case leads to what is called the 


Jordan canonical form for a matrix, which will be discussed in Chapter 12. 
4 As an example of diagonalization, consider the matrix 


_ ee) 
fr : = 
an! é h _ 
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whose eigenvectors were computed in (4.10). These eigenvectors form a basis 
B = (vt). vz) of R°. According to [Chapter 3 (4.20), see also Note (5.24)], the matrix 
relating the standard basis E to this basis B is 


We a Nig tl jl = 
(6.5) p=(e)'=|1 7] = 317) AI 


and PAP '' = A’ is diagonal: 


———_~ 


oR Pde 


The general rule is stated in Corollary (6.7): 


Lo 
16.7) Corollary. If a basis B of eigenvectors of A in F” is known and if P = [B]"', 
then A’ = PAP™' is diagonal. o 


The importance of Theorem (6.4) comes from the fact that it is easy to com- 
pute with diagonal matrices. For example, if A’ = PAP’! is diagonal, then we can 
compute powers of the matrix A using the formula . 


(6.8) A (Pp A’ Py’ = Pp AP, 


Thus if A is the matrix (4.9), then 


poe! 2115 ee. ae 
wea FA) ie bl © SL ame ao 922) | 


7. SYSTEMS OF DIFFERENTIAL EQUATIONS 


We learn in calculus that the solutions to the first-order linear differential equation 


(7) = 10k 


are x(t) = ce, c being an arbitrary constant. Indeed, ce obviously solves (7.1). 
To show that every solution has this form, let x(t) be an arbitrary differentiable 
function which is a solution. We differentiate ex (¢) using the product rule: 


<(e%x(0) = -ae x(t) + e “ax(t) = 0. 


Thus e x(t) is a constant c, and x(t) = ce”. 

As an application of diagonalization, we will extend this solution to systems of 
differential equations. In order to write our equations in matrix notation, we use the 
following terminology. A vector-valued function X(t) is a vector whose entries are 
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functions of t. Similarly, a matrix-valued function A(t) is a matrix whose entries are 
functions: 


xi (t) au(t) -*  din(t) 
(a2 x=]: |, aAQ= 
Xn(t) Ami(t) ++ Amn(t) 
The calculus operations of taking limits, differentiating, and so on are ex- 


tended to vector-valued and matrix-valued functions by performing the operations on 
each entry separately. Thus by definition 


é, 
(7.3) lim x(t) = | - |, where & = limx,(#). 
tg : tg 
E, 


So this limit exists if and only if lim x;(t) exists for each 7. Similarly, the derivative 
of a vector-valued or matrix-valued function is the function obtained by differentiat- 
ing each entry separately: 


an'(t) was ain’ (t) 


x1'(t) 
ax _ |: dA 
de ae ae ae 
A Gmi'(t) => Tae tt) 


where x;'(t) is the derivative of x;(t), and so on. So dx/dt is defined if and only if 
each of the functions x;(t) is differentiable. The derivative can also be described in 
vector notation, as 


dX. X(t +h) — x(t) 
a4 a7 hi : 
Here x(t + h) — X(t) is computed by vector addition and the h in the denominator 
stands for scalar multiplication by h~'. The limit is obtained by evaluating the limit 
of each entry separately, as above. So the entries of (7.4) are the derivatives x;’(t). 
The same is true for matrix-valued functions. 

A system of homogeneous first-order linear, constant-coefficient differential 
equations is a matrix equation of the form 


dx 


AX, 


where A is an n Xn real or complex matrix and X(t) is an n-dimensional vector- 
valued function. Writing out such a system, we obtain a system of n differential 
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equations, of the form 


= toad: 
7 Qi Xi) PF aia) 
(7.6) 
7 +o + 
dt = aniXi(t) AnnXn(t). 


The x;(t) are unknown functions, and the aj are scalars. For example, if we sub- 


: gellar 2 , : 
stitute the matrix for A, (7.5) becomes a system of two equations in two 


1 4 
unknowns: 
dx 
a = 3x) ae 2x2 
Tod 
i “nd xX; + 4x 
dt 2 


The simplest systems (7.5) are those in which A is a diagonal matrix. Let the 
diagonal entries be a;. Then equation (7.5) reads 
d . 
(7.8) a = ajxi(t), i= 1,...,7. 
Here the unknown functions x; are not mixed up by the equations, so we can solve 
for each one separately: 


(7.9) xi = cye%, 


for some constant c;. 
The observation which allows us to solve the differential equation (7.5) in most 


cases is this: If v is an eigenvector for A with eigenvalue a, then 
(7.10) X= ev 


is a particular solution of (7.5). Here e“v is to be interpreted as the scalar product 
of the function e” and the vector v. Differentiation operates on the scalar function, 
fixing the constant vector v, while multiplication by A operates on the vector v, 
fixing the scalar function e”. Thus $e“v = ae“v = Ae“v. For example, (2, — 1)’ is 

' — , 2e7 
an eigenvector with eigenvalue 2 of the matrix E i} and out solves the sys- 


tem of differential equations (7.7). 

This observation allows us to solve (7.5) whenever the matrix A has distinct 
real eigenvalues. In that case every solution will be a linear combination of the spe- 
cial solutions (7.10). To work this out, it is convenient to diagonalize. Let us replace 
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the notation ' used in the previous section by ~ here, to avoid confusion with differ- 


entiation. Let P be an invertible matrix such that PAP'' = A is diagonal. So 
P = [B] ', where B is a basis of eigenvectors. We make the linear change of variable 
(7.11) pe Jah e 
Then 

dx dx 
(7.12) oa = Poe 


Substituting into (7.5), we find 
(Bk) — = PAP'X = AX, 


Since A is diagonal, the variables %; have been separated, so the equation can be 
solved in terms of exponentials. The diagonal entries of A are the eigenvalues 
Ai,...,An Of A, so the solution of the system (7.13) is 


(7.14) ; = cie*', for some cj. 
Substituting back, 
(7.15) Deere 


solves the original system (7.5). This proves the following: 


(7.16) Proposition. Let A be an n Xn matrix, and let P be an invertible matrix 
such that PAP"' = A is diagonal, with diagonal entries A,,..., An. The general solu- 


: dX ne a2 
tion of the system re AX is'X = P™'X, where %; = cje*', for some arbitrary con- 
stants cj. 0. 


The matrix which diagonalizes A in example (7.7) was computed in (6.5): 


(7.17) pl= k a and A = \° al 


Thus 


M te ices ce + 2c.e7 
( ) ial [ ale | ae Geel 


In other words, every solution is a linear combination of the two basic solutions 


y= [ee] ae [2] = [72] 
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These are the solutions (7.10) corresponding to the eigenvectors (1, 1) and (2, -1)'. 
The coefficients c; appearing in these solutions are arbitrary. They are usually deter- 
mined by assigning initial conditions, meaning the value of X at some particular fo. 

Let us now consider the case that the coefficient matrix A has distinct eigenval- 
ues, but that they are not all real. To copy the method which we used above, we 
must first consider differential equations of the form (7.1), in which a is a complex 
number. Properly interpreted, the solutions of such a differential equation still have 
the form ce”. The only thing to remember is that e” will now be a complex-valued 
function of t. In order to focus attention, we restrict the variable ¢ to real values 
here, although this is not the most natural choice when working with complex-valued 
functions. Allowing t to take on complex values would not change things very 
much. 

The definition of the derivative of a complex-valued function is the same as for 
real-valued functions: 


dx etn) — x(t) 
— = lim ——_———— 


318) dt = ho h j 


provided that this limit exists. There are no new features. We can write any such 
function x(t) in terms of its real and imaginary parts, which will be real-valued 
functions: 


(7.20) x(t) = u(t) + iv(t). 


Then x is differentiable if and only if u and v are differentiable, and if they are, the 
derivative of x is x’ = u' + iv’. This follows directly from the definition. The 
usual rules for differentiation, such as the product rule, hold for complex-valued 
functions. These rules can be proved by applying the corresponding theorem for real 
functions to u and v, or else by carrying the proof for real functions over to the com- 
plex case. 

Recall the formula 


(7.21) e’t = e"(cos s + isins). 


Differentiation of this formula shows that de“/dt = ae™ for all complex numbers 
a=r+si. Therefore ce” solves the differential equation (7.1), and the proof 
given at the beginning of the section shows that these are the only solutions. 

Having extended the case of one equation to complex coefficients, we can now 
use the method of diagonalization to solve a system of equations (7.5) when A is an 
arbitrary complex matrix with distinct eigenvalues. 

i 

For example, let A = Ee . The vectors v, = | and v= | 
are eigenvectors, with eigenvalues 1 + i and 1 —1 respectively. Let B = (v1, v2). 
According to (6.7), A is diagonalized by the matrix P, where 


(7.22) Pp? = [B] = j il 
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a ttit 
Formula (7.14) tells us that ¥ = | = ie =p The solutions of (7.5) are 


: x xr | Par a cue" re er 

( . ) Xo ic,e?t# Jt Deer ? 

where ‘c;, C2 are arbitrary complex numbers. So every solution is a linear combina- 
tion of the two basic solutions 


tt+it ih 
e ie 
(7.24) sea and mal 
However, these solutions are not completely satisfactory, because we began with a 
system of differential equations with real coefficients, and the answer we obtained is 


complex. When the original matrix is real, we want to have real solutions. We note 
the following lemma: 


(7.25) Lemma. Let A be a real n X n matrix, and let X(t) be a complex-valued so- 
lution of the differential equation (7.5). The feal and imaginary parts of x(t) 
solve the same equation. o 


Now every solution of the original equation (7.5), whether real or complex, 
has the form (7.23) for some complex numbers c;. So the real solutions are among 
those we have found. To write them down explicitly, we may take the real and imag- 
inary parts of the complex solutions. 

The real and imaginary parts of the basic solutions (7.24) are determined using 
(7.21). They are 


t t ot 
a Boe a be 


Every real solution is a real linear combination of these particular solutions. 


8. THE MATRIX EXPONENTIAL 


Systems of first-order linear, constant-coefficient differential equations can also be 
solved formally, using the matrix exponential. The exponential of an n Xn real or 
complex matrix A is obtained by substituting a matrix into the Taylor’s series 


(8.1) 1+ x/1! + x?/2! + x3/3! + + 
for e*. Thus by definition, 
] 1 
A = ————-! 2 — 3 eon 
(8.2) e=1tataA +34 + : 


This is an n X n matrix. 
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(8.3) Proposition. The series (8.2) converges absolutely for all complex matrices A. 


In order not to break up the discussion, we have collected the proofs together at the 
end of the section. 


Since matrix multiplication is relatively complicated, it isn’t easy to write 
down the matrix entries of e“ directly. In particular, the entries of e4 are usually 
not obtained by exponentiating the entries of A. But one case in which they are, and 
in which the exponential is easily computed, is when A is a diagonal matrix, say with 
diagonal entries a;. Inspection of the series shows that e“ is also diagonal in this 
case and that its diagonal entries are e%. 

The exponential is also relatively easy to compute for a triangular 2 x 2 ma- 
trix. For example, let 


(8.4) ae j | 


Then 


oo elt ef Jef ef ah 


The diagonal entries are exponentiated to obtain the diagonal entries of e“. It is a 
good exercise to calculate the missing entry * directly from the definition. 

The exponential of a matrix A can also be determined whenever we know a 
matrix P such that PAP”! is diagonal. Using the rule PAXP™' = (PAP™')* and the dis- 
tributive law for matrix multiplication, we find 


1 -1 
(8.6) PeAp"! = pip" + (PAP™') + 3 PAP’)? + see = gPAPT 
Suppose that PAP~! = A is diagonal, with diagonal entries A;. Then e4 is also diago- 
nal, and its diagonal entries are e*’. Therefore we can compute e“ explicitly: 
(8.7) eu Pp eR. 


In order to use the matrix exponential to solve systems of differential equa- 
tions, we need to extend some of the properties of the ordinary exponential to it. 
The most fundamental property is e**” = e*e”. This property can be expressed as 
a formal identity between the two infinite series which are obtained by expanding 


er 1 + (x + y)/li + G + y)?/2! +: and 
Ce et x/ We 2! + y/o re 


We can not substitute matrices into this identity because the commutative law is 
needed to obtain equality of the two series. For instance, the quadratic terms of 
(8.8), computed without the commutative law, are 3(x? ey yet y?) and 
ty? + xy + 4y?. They are not equal unless xy = yx. So there is no reason to expect 


(8.8) 
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e4*8 to equal e4e in general. However, if two matrices A and B happen to com- 
mute, the formal identity can be applied. 


(8.9) Proposition. 


(a) The formal expansions of (8.8), with commuting variables x,y, are equal. 
(b) Let A,B be complex nXn matrices which commute: AB = BA. Then 


Ps: pean VT 


The proof is at the end of the section. o 


(8.10) Corollary. For any n Xn complex gnatrix A, the exponential e“ is invert- 
ible, and its inverse is e™“. 


This follows from the proposition because A and -A commute, and hence e4e 4 = 


Cee = a 


As a sample application of Proposition (8.9b), consider the matrix 


(8.11) A= |? | 


We can compute its exponential by writing it in the form A = 2/ + B, where 
B = 3e,.. Since 27 commutes with B, Proposition (8.9b) applies: e4 = e%e?, and 
from the series expansion we read off the values e” = e?/ and e? = 7 + B. Thus 


oF IE 


We now come to the main result relating the matrix exponential to differential 
equations. Given an n X n matrix A, we consider the exponential e“, 1 being a vari- 
able scalar, as a matrix-valued function: 


2 3 


Ae Se ee 
(8.12) e =I+Aat+sTA + 4 + 


(8.13) Proposition. e' is a differentiable function of t, and its derivative is Ae“. 


The proof is at the end of the section. o 


(8.14) Theorem. Let A be a real or complex n X n matrix. The columns of the 
matrix e form a basis for the vector space of solutions of the differential equation 


ax _ 
dt 


We will need the following lemma, whose proof is an exercise: 
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(8.15) Lemma. Product rule: Let A(t) and B(t) be differentiable matrix-valued 
functions of ¢, of suitable sizes so that their product is defined. Then the matrix 
product A(¢)B(t) is differentiable, and its derivative is 

d dA dB 

fi (A(t)B(t)) . B+A ra 

Proof of Theorem (8.14). Proposition (8.13) shows that the columns of A 

solve the differential equation, because differentiation and multiplication by A act in- 
dependently on the columns of the matrix e. To show that every solution is a linear 
combination of the columns, we copy the proof given at the beginning of Section 7. 
Let X(t) be an arbitrary solution of (7.5). We differentiate the matrix product 
e ‘4x(t), obtaining 


d 4 

ae x) = —AeAx(t) + e “Ax(t). 

Fortunately, A and e““ commute. This follows directly from the definition of the ex- 
ponential. So the derivative ‘is zero. Therefore, e~X(t) is a constant column vector, 
say C = (¢1,...,Cn)', and X(t) = ec. This expresses X(t) as a linear combination of 
the columns of e. The expression is unique because e“ is an invertible matrix. o 


According to Theorem (8.14), the matrix exponential always solves the differ- 
ential equation (7.5). Since direct computation of the exponential can be quite 
difficult, this theorem may not be easy to apply in a concrete situation. But if A is a 
diagonalizable matrix, then the exponential can be computed as in (8.7): 
e4 = p''e4p. We can use this method of evaluating e“ to solve equation (7.5), but 
of course it gives the same result as before. Thus if A is the matrix used in example 
(7.7), so that P, A are as in (7.17), then 


: oH 
Z| P| 
. ; 1.) 1. ie" SS eae 
(Apap Ape 
as tall alla; 1] 


1{e% + 2e% 2e% — 2e* 
=, ome et Der + 22 


and 


The columns we have obtained form a second basis for the general solution (7.18). 


On the other hand, the matrix A = iP which represents the system of 


equations 
dx dy 
— «= —_— = + 4 
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is not diagonalizable. So the method of Section 7 can not be applied. To solve it, we 
write At = /t + Bt, where B = e2,, and find, as in the discussion of (8.11), 


t 
(8.17) pe ee le a 


Thus the solutions of (8.16) are linear combinations of the columns 


(8.18) | I?) 


To compute the exponential explicitly in all cases requires putting the matrix into 
Jordan form (see Chapter 12). 

We now go back to prove Propositions (8.3), (8.9), and (8.13). For want of a 
more compact notation, we will denote the i, j-entry of a matrix A by Aj here. So 
(4B);; will stand for the entry of the product matrix AB, and (4*),; for the entry of A*. 
With this notation, the i, j-entry of e“ is the sum of the series 


1 1 
(8.19) (e4)j = ly + Ay + TAG a 3 Aa + +, 


In order to prove that the series for the exponential converges, we need to 
show that the entries of the powers A* of a given matrix do not grow too fast, so that 
the absolute values of the i, j-entries form a bounded (and hence convergent) series. 
Let us define the norm of an n X n matrix A to be the maximum absolute value of the 
matrix entries: 


(8.20) [Al] = max |4g}. 
In other words, ||A|| is the smallest real number such that 
(8.21) [Ay| =< ||A|} for all i,j. 


This is one of several possjble definitions of the norm. Its basic property is as fol- 
lows: 


(8.22) Lemma. Let A,B be complex nXn matrices. Then ||4B|| <= n|j4|||/B]], 
and ||A*|| < n*"I|A |‘ for all k > 0. 


Proof. We estimate the size of the i, j-entry of AB: 


< 2 |Ain||Bui| = nlla|||lall 


| (4B)i;| = | > AB, 


Thus |/4B|| < n||A||||B|]. The second inequality follows by induction from the first 
inequality. o 


Proof of Proposition (8.3). To prove that the matrix exponential converges ab- 
solutely, we estimate the series as follows: Let a = n||A||. Then 
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IA 


| 1 
[4y| + [Ay| + 5 | 40a + —|(A%)y| + + 


(8.23) |e) aL 


A 


l ! 
<1+ (al + sonllalP + nal? + 


I+ (a+ a? +50 + +)/n = 14+ (e% — 1)/n.a 


II 


Proof of Proposition (8.9). 
(a) The terms of degree k in the expansions of (8.8) are 
(x+yk/kt= > (Jase and >», cle ae 
r+s=k \P r+s=k1! s! 


To show that these terms are equal, we have to show that 


' 
(\va See (*) = ioe, 
'g ris! r ris! 


for all k and all r,s such that r + s = k. This is a standard formula for bino- 
mial coefficients. 


(b) Denote by S,(x) the partial sum 1 + x/1! + x?/2! + --- + x"/n!. Then 


See (Tee... eee y/1! + P72! + -- + yen!) 
ie 
Tash or! s! i 
while 


Suey (lt (x ty) bor (x ey) 2) tae yy!) 
n k Pe _ Tana S: 
¥ 2 PAs is a Ds 2, ey 


Comparing terms, we find that the expansion of the partial sum S,(x + y) 
consists of the terms in S,(x)S,(y) such that r + s <n. The same is true when 
we substitute commuting matrices A, B for x, y. We must show that the sum of 
the remaining terms tends to zero as k —>™, 


(ri), 


Proof. Let a = n|\A|| and b = n||B||. We estimate the terms in the sum. Ac- 
cording to (8.22), | (A’B’)| = n(n’ '[Al|)(n°'|B|°) S a’b’. Therefore 


(8.24) Lemma. The series 2 Sy converges for all i, j. 


rt+s=k 


s 
kr+s=kr! s! 


k r+s=k 
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The proposition follows from this lemma because, on the one hand, the 7, j-entry of 


é B\ | 
r! aa 


According to the lemma, this sum tends to zero as K——> ~. And on the other hand, 


(S,(A)S(B) — Sy(A + B)),, is bounded by > 


(PaPeea 


(Sx(A)Si(B) — SA + B))— (e4e8 — 0478). G 


Proof of Proposition (8.13). By definition, 


elt tha Ge eA 


Go pian ea: 7 
Bee ies te h j 


Since the matrices tA and AA commute, the Proposition (8.9) shows that 


eli tha ae eA (< Se ‘) . 
= 7 Sa Cam 


h h 
So our proposition follows from this lemma: 
($.25) Lemmageglin———— =. A. 
h-0 h 


Proof. The series expansion for the exponential shows that 


eta 87, h ic 

8.26 rg golly roe 
( ) ; =A A n 
We estimate this series: Let a = |h|n||A||. Then 

h Boar hn h F he 
| (he ae 314 + 7 <= 5 Aa = 3 Ai ae 
= ul h eB oe I hen 3 Ze | 1 
<= Slalnlale + 5.[aknillalh + = lal + gat + 


fe Tp) 


So (8.26) tends to zero with h. 5 


We will use the remarkable properties of the matrix exponential again, in 
Chapter 8. 
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E 


Ib 


— 


eo 


I have not thought it necessary to undertake the labour 
of a formal proof of the theorem in the general case. 


Arthur Cayley 
XERCISES 
The Dimension Formula 


20 es 
: ; Bs : : . Compute ker T and im T 
33) BG 


. Let T be left multiplication by the matrix 
0 
explicitly by exhibiting bases for these spaces, and verify (1.7). 
2 13 are 
; re pele 22. 23° 24 
. Determine the rank of the matrix 3132 33 34 : 
41 42 43 44 


. Let T: V-——> W be a linear transformation. Prove that ker T is a subspace of V and that 
im T is a subspace of W. 

. Let A be an m X n matrix. Prove that the space of solutions of the linear system AX = 0 
has dimension at least n — m. 

. Let A be a k X m matrix and let B be an n X p matrix. Prove that the mle M~™~» AMB 
defines a linear transformation from the space F’””” of m Xn matrices to the space 
Pee. 

. Let (v;,...,0n) be a subset of a vector space V. Prove that the map g: F"——>V 
defined by ¢(X) = wix; + + + tpX, is a linear transformation. 

. When the field is one of the fields F,, finite-dimensional vector spaces have finitely many 
elements. In this case, formula (1.6) and formula (6.15) from Chapter 2 both apply. 
Reconcile them. 

. Prove that every m X n matrix A of rank | has the form A = XY‘, where X,Y are m- and 
n-dimensional column vectors. 

. (a) The left shift operator S~ on V = R® is defined by (a), a2,...,) “~~ (a2, a3,...). 

Prove that ker S~ > 0, but im S~ = V. 
(b) The right shift operator S* on V = R® is defined by (a), a2,...) ~~» (0, a1, ao,...). 
Prove that ker st = 0, but im S* < V. 


The Matrix of a Linear Transformation 


d 
. Determine the matrix of the differentiation operator —: P,—— P,—; with respect to the 
dx 
natural bases (see (1.4)). 
. Find all linear transformations T: R*——> R? which carry the line y = x to the line 
y = 3x. 
. Prove Proposition (2.9b) using row and column operations. 
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4. Let 7: R°——> R® be the linear transformation detined by the rule T(x, ,x2,.%3)' = 
(x, + x, 2x3 — x,)". What is the matrix of T with respect to the standard bases? 
5. Let A be ann Xn matrix, and let V = F” denote the space of row vectors. What is the 
matrix of the linear operator “right multiplication by A” with respect to the standard basis 
of V? 
6. Prove that different matrices define different linear transformations. 
7. Describe left multiplication and right multiplication by the matrix (2.10), and prove that 
the rank of this matrix is r. 
8. Prove that A and A' have the same rank. 
9. Let 7,.7T, be linear transformations from V to W. Define 7, + 72 and cT by the rules 
[T, + T2}(w) = T,(v) + T2(v) and [cT}(v) = cT fv). 
(a) Prove that 7; + 72 and c7, are linear transformations, and describe their matrices in 
terms of the matrices for 7; . 7: 
(b) Let L be the set of all linear transformations from V to W. Prove that these laws 
make L into a vector space, and compute its dimension. 


3. Linear Operators and Eigenvectors 


1. Let V be the vector space of real 2 x 2 symmetric matrices X = k : |. and let 


Vv 


2) 


A= ? ' Determine the matrix of the linear operator on V defined by X ~~~» AXA', 


with respect to a suitable basis. 


2. Let A = (aj), B = (by) be 2 X 2 matrices, and consider the operator 7: M~~~ AMB on 
the space F?*? of 2X2 matrices. Find the matrix of T with respect to the basis 
(€11, €12, €21, 22) Of F?*?. 

3. Let T: V—-~ V be a linear operator on a vector space of dimension 2. Assume that T 1s 
not multiplication by a scalar. Prove that there is a vector v € V such that (v, T(v)) is a 
basis of V, and describe the matrix of T with respect to that basis. 


4. Let T be a linear operator on a vector space V, and let c € F. Let W be the set of eigen- 
vectors of T with eigenvalue c, together with 0. Prove that W is a 7-invariant subspace. 


5. Find all invariant subspaces of the real linear operator whose matrix is as follows. 
1 
| 
(a) i (b)} 2 
. 

6. An operator on a vector space V is called nilpotent if T* = 0 for some k. Let T be a nil- 

potent operator, and let W' = im T". 

(a) Prove that if W' # 0, then dim W‘*! < dim W’. 


(b) Prove that if V is a space of dimension n and if T is nilpotent, then 7” = 0. 


7. Let T be a linear operator on R’. Prove that if 7 carries a line € to €, then it also carries 
every line parallel to / to another line parallel to J. 


8. Prove that the composition 7, ° 7, of linear operators on a vector space is a linear opera- 
tor, and compute its matrix in terms of the matrices A;,A2 of 7, 7>. 


9. Let P be the real vector space of polynomials p(x) = ao + a + +++ + anx" of degree 


er , ~ 
=n, and let D denote the derivative ah’ considered as a linear operator on P. 
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(a) Find the matrix of D with respect to a convenient basis, and prove that D is a nilpo- 
tent operator. 
(b) Determine all the D-invariant subspaces. 


10. Prove that the matrices ie | and E ‘| (b # 0) are similar if and only if a # d. 
a b 


11. Let A = Ms d be a real 2 X 2 matrix. Prove that A can be reduced to a matrix 


QO * ; 
| ‘| by row and column operations of the form A—— EAE ', unless b = c = O and 


a = d. Make a careful case analysis to take care of the possibility that b or c is zero. 

12. Let T be a linear operator on R? with two linearly independent eigenvectors v,, v2. As- 
sume that the eigenvalues c, c2 of these operators are positive and that c) > c2. Let €; be 
the line spanned by tj. 

(a) The operator 7 carries every line € through the origin to another line. Using the par- 
allelogram law for vector addition, show that every line € # €, is shifted away from 
€, toward ¢,. 

(b) Use (a) to prove that the only eigenvectors are multiples of v, or v2. 

(c) Describe the effect on lines when there is a single line carried to itself, with positive 
eigenvalue. | ‘A 


13. Consider an arbitrary 2 X 2 matrix A = - 


be an eigenvector for left multiplication by A is that Y = AX be parallel to X, which means 

that the slopes s = x./x, and s’ = y2/y, are equal. 

(a) Find the equation in s which expresses this equality. 

(b) For which A is s = 0 a solution? s = ©? 

(c) Prove that if the entries of A are positive real numbers, then there is an eigenvector in 
the first quadrant and also one in the second quadrant. 


| The condition that a column vector X 


4. The Characteristic Polynomial 


1. Compute the characteristic polynomials, eigenvalues, and eigenvectors of the following 
complex matrices. 


(a) s | (b) ee 
2. (a) Prove that the eigenvalues of a real symmetric 2 X 2 matrix are real numbers. 
(b) Prove that a real 2 X 2 matrix whose off-diagonal entries are positive has real 
eigenvalues. 
3. Find the complex eigenvalues and eigenvectors of the notation matrix 


is 6 -sin | 
sin@ cos 6] 


4. Prove that a real 3 X 3 matrix has at least one real eigenvalue. 
5. Determine the characteristic polynomial of the matrix 


0 1 
1 01 
i 
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6. Prove Proposition (4.18). 
7. (a) Let 7 be a linear operator having two linearly independent eigenvectors with the 
same eigenvalue A. Is it true that A is a multiple root of the characteristic polynomial 
of T? 
(b) Suppose that A is a multiple root of the characteristic polynomial. Does T have two 
linearly independent eigenvectors with eigenvalue A? 


8. Let V be a vector space with basis (v,,..., On) over a field F, and let a,,...,a@,—1 be ele- 
ments of F. Define a linear operator on V by the rules T(v;) = vi+. if i <n and 
T (vn) = ape F a2v2 + see + QAn—1Un-1.- 
(a) Determine the matrix of T with respect to the given basis. 
(b) Determine the characteristic polynomial of 7. 


9. Do A and A‘ have the same eigenvalues? the same eigenvectors? 


10. (a) Use the characteristic polynomial to prove that a 2 X 2 real matrix P all of whose en- 
tries are positive has two distinct real ‘eigenvalues. 
(b) Prove that the larger eigenvalue has an eigenvector in the first quadrant, and the 
smaller eigenvalue has an eigenvector in the second quadrant. 


11. (a) Let A be a 3 X 3 matrix, with characteristic polynomial 
p(t) = t — (trA)t? + st — (det A). 


Prove that s; is the sum of the symmetric 2 X 2 subdeterminants: 


a a a, a a2 a 
5 = de] z| es del is a 2 cet p = 
a2, 22 a3, 33 432 «33 
*(b) Generalize to n X n matrices. 


12. Let T be a linear operator on a space of dimension n, with eigenvalues A,,..., An. 
(a) Prove that trT = A; + --- + A, and that det T = A, ++: An. 
(b) Determine the other coefficients of the characteristic polynomial in terms of the 
eigenvalues. 


*13. Consider the linear operator of left multiplication of an n X n matrix A on the space F”*” 
of all n X n matrices. Compute the trace and the determinant of this operator. 


*14. Let P be a real matrix such that P‘ = P*. What are the possible eigenvalues of P? 


15. Let A be a matrix such that A” = /. Prove that the eigenvalues of A are powers of nth root 
ofunity Ge". 


5. Orthogonal Matrices and Rotations 


1. What is the matrix of the three-dimensional rotation through the angle @ about the axis 
e>? 


2. Prove that every orthonormal set of n vectors in R” is a basis. 


3. Prove algebraically that a real 2 x 2 matrix b | represents a rotation if and only if it 
is in SO3. 


4. (a) Prove that O, and SO, are subgroups of GL,(R), and determine the index of SO, in 
On. 


(b) Is O2 isomorphic to the product group SO, x {+1}? Is O3 isomorphic to SO; x {+1}? 
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Ds 


6: 
Te 


*10. 


11. 


12. 


13. 
*14, 


What are the eigenvalues of the matrix A which represents the rotation of R? by @ about 
an axis v? 
Let A be a matrix in O, whose determinant is —1. Prove that —1 is an eigenvalue of A. 


Let A be an orthogonal 2 x 2 matrix whose determinant is —1. Prove that A represents a 
reflection about a line through the origin. 


. Let A be an element of SO, with angle of rotation @. Show that cos 6 = 4(tr A — 1). 
. Every real polynomial of degree 3 has a real root. Use this fact to give a less tricky proof 


of Lemma (5.23). 


Find a geometric way to determine the axis of rotation for the composition of two three- 
dimensional rotations. 


Let v be a vector of unit length, and let P be the plane in R* orthogonal to v. Describe a 
bijective correspondence between points on the unit circle in P and matrices P € SO; 
whose first column is v. 


Describe geometrically the action of an orthogonal matrix with determinant —1. 
Prove that a rigid motion, as defined by (5.15), is bijective. 
Let A be an element of SO;. Show that if it is defined, the vector 

((a23 + 32) ', (ai3 + aa)”', (ai2 + an) ')' 


is an eigenvector with eigenvalue |. 


6. Diagonalization 


ie 


(a) Find the eigenvectors and eigenvalues of the matrix 


2 | 
It al 
(b) Find a matrix P such that PAP”! is diagonal. 
(c) Compute Fr : ii 
cos @ —sin 6 
sin@ cos @ 
Prove that if A,B are n Xn matrices and if A is nonsingular, then AB is similar to BA. 
Let A be a complex matrix having zero as its only eigenvalue. Prove or disprove: A is 
nilpotent. 
In each case, if the matrix is diagonalizable, find a matrix P such that PAP ' is diagonal. 
=e Li = 3 0 0 1 
(a) [ ‘| (b) [5 (c)|}0 4 5} (il O O 
0 0 6 010 
Can the diagonalization (6.1) be done with a matrix P € SL,? 
Prove that a linear operator T is nilpotent if and only if there is a basis of V such that the 
matrix of T is upper triangular, with diagonal entries zero. 
Let T be a linear operator on a space of dimension 2. Assume that the characteristic poly- 
nomial of T is (t — a). Prove that there is a basis of V such that the matrix of T has one 
ie a 0 
0 aj’|0 al 


Diagonalize the rotation matrix |. using complex numbers. 


of the two forms 


12. 


13. 


14. 


=ES, 


16. 


aa WY 


*18. 
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. Let A be a nilpotent matrix. Prove that det(/ + A) = 1. 
. Prove that if A is a nilpotent n X n matrix, then A” = 0. 
. Find all real 2 X 2 matrices such that A* = /, and describe geometrically the way they 


operate by left multiplication on R’. 


A 


0 ee 
0 a Prove that M is diago- 


Let M be a matrix made up of two diagonal blocks: M = 


nalizable if and only if A and D are. 
a b 

(a) LetA = le d 
genvector for A. 

(b) Find a matrix P such that PAP™' is diagonal, if A has two distinct eigenvalues 
A, # Ad. 

Let A be a complex n Xn matrix. Prove that there is a matrix B arbitrarily close to A 

(meaning that | b;; — aj;| can be made arbitrarily small for all i, j) such that B has n dis- 

tinct eigenvalues. 

Let A be a complex n X n matrix with n distinct eigenvalues A,...,An. Assume that A, is 

the largest eigenvalue, that is, that |A,| > |A;| for all i > 1. Prove that for most vectors 

X the sequence X,; = A, “A*x converges to an eigenvector Y with eigenvalue A,, and de- 

scribe precisely what the conditions on X are for this to be the case. 


(a) Use the method of the previous problem to compute the largest eigenvalue of the ma- 


| be a 2 x 2 matrix with eigenvalue A. Show that (b,A — a)' is an ei- 


trix E to three-place accuracy. 


3 4 
ey ae 
(b) Compute the largest eigenvalue of the matrix} 1 1 1 | to three-place accuracy. 
Leow 


Let A be m X m and B be n X n complex matrices, and consider the linear operator T on 

the space F”*" of all complex matrices defined by T(M) = AMB. 

(a) Show how to construct an eigenvector for T out of a pair of column vectors X,Y, 
where X is an eigenvector for A and Y is an eigenvector for B'. 

(b) Determine the eigenvalues of 7 in terms of those of A and B. 

Let A be an n X n complex matrix. 

(a) Consider the linear operator T defined on the space F”*”" of all complex n Xn 
matrices by the rule 7 (8) = AB — BA. Prove that the rank of this operator is at most 
Te 1, 

(b) Determine the eigenvalues of T in terms of the eigenvalues A,,...,A,n of A. 


7. Systems of Differential Equations 


1. Let v be an eigenvector for the matrix A, with eigenvalue c. Prove that e“‘v solves the 


differential equation a = AX. 


2. Solve the equation a= AX for the following matrices 


dt 


A: 
’ 23 Oma 
(a) k sf | 3 | |_| | @10 4 5] eli oo 
00 6 010 


3. Explain why diagonalization gives the general solution. 
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4. 


(a) Prove Proposition (7.16). 
(b) Why is it enough to write down the real and imaginary parts to get the general 
solution? 


. Prove Lemma (7.25). 


. Solve the inhomogeneous differential equation aK = AX + B in terms of the solutions to 


at 


the homogeneous equation a = AX. 


. A differential equation of the form d"x/dt" + an-;d" 'x/dt”"' + ++ + a,dx/dt + 


aox = 0 can be rewritten as a system of first-order equations by the following trick: We 


introduce unknown functions x9,x1,...,Xn—1 with x = xo, and we set dx;/dt = xj+, for 
i = 0,...,n — 2. The original equation can be rewritten as the system dx;/dt = xi+1, 
i= 0,...,2 — 2, and dxp-,/dt = —(an-;Xn-) + °° + aixi + aox). Determine the ma- 


trix which represents this system of equations. 


- (a) Rewrite the second-order linear equation in one variable 


d*x dx 
— + b—+cx=0 
dt? J ee 
as a system of two first-order equations in two unknowns Xo = x, x, = dx/dt. 
(b) Solve the system when b = —4 and c = 3. 


. Let A be an nm X n matrix, and let B(t) be a column vector of continuous functions on the 
‘ 


interval [a, B]. Define F(t) = | e “B(t) dt. 


(a) Prove that X = F(t) isa solution of the differential equation x’ = AX + B(t) on the 
interval (a, B). 
(b) Determine all solutions of this equation on the interval. 


The Matrix Exponential 2 


. Compute e4 for the following matrices A: 


of: )or 


|| sbaegel 
Leta =| iM 


(a) Compute e“ directly from the expansion. 
(b) Compute e“ by diagonalizing the matrix. 


. Compute e4 for the following matrices A: 


ae | 
(a) et w {5 1 ©) ‘0 


. Compute e“ for the following matrices A: 


(a) kes ald (b) Pe oh 


277i 2ni 877i 


. Let A be ann Xn matrix. Prove that the map t~~~ e" is a homomorphism from the ad- 


ditive group R* to GL,(C). 
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6. Find two matrices A,B such that e4*+? # e4e8. 
7. Prove the formula e'#°* 4 = det(e4). 
8 
9 


dt (2 
. Let f(t) be a polynomial, and let T be a linear operator. Prove that f(7) is a linear 
operator. 
10. Let A be a symmetric matrix, and let f(r) be a polynomial. Prove that f(A) is symmetric. 
11. Prove the product rule for differentiation of matrix-valued functions. 
12. Let A(t), B(t) be differentiable matrix-valued functions of t. Compute the following. 
(a) d/dt(A(1)’) 
(b) d/dt(A(t)"'), assuming that A(t). is invertible for all f 
(c) d/dt(a(t) 'B(2)) 
13. Let X be an eigenvector of an n X n matrix A, with eigenvalue A. 
(a) Prove that if A is invertible then X is also an eigenvector for A~', and that its eigen- 
value is A''. 
(b) Let p(t) be a polynomial. Then xX is an eigenvector for p(A), with eigenvalue p(A). 
(c) Prove that X is an eigenvector for e4, with eigenvalue e*. 


. Solve the differential equation Oe AX, when A = E | 


14. For an n X n matrix A, define sin A and cos A by using the Taylor’s series expansions for 
sin x and cos x. 
(a) Prove that these series converge for all A. 
(b) Prove that sin fA is a differentiable function of ¢ and that d(sin ta)/dt = A cos tA. 
15. Discuss the range of validity of the following identities. 
(a) cos*A + sin*A = / 
(b) e'4 = cosA + isin A 
(c) sin(A + B) = sin A cos B + cos A sin B 
(d) cos(A + B) = cos A cos B — sin A sin B 
(e) e2mA = fi 
(f) d(e4"")/dt = e4 a'(t), where A(t) is a differentiable matrix-valued function of t. 
16. (a) Derive the product rule for differentiation of complex-valued functions in two ways: 
directly, and by writing x(t) = u(t) + iv(t) and applying the product rule for real- 
valued functions. 
(b) Let f(t) be a complex-valued function of a real variable t, and let y(u) be a real- 
valued function of u. State and prove the chain rule for f(g (w)). 
17. (a) Let B, be a sequence of m x n matrices which converges to a matrix B, and let P be 
an m X m matrix. Prove that PB, converges to PB. 
(b) Prove that if m = n and P is invertible, then PB, P-' converges to PBP™!. 
18. Let f(x) = Xcxx* be a power series such that Zc, A* converges when A is a sufficiently 
small n X n matrix. Prove that A and f(A) commute. 


19. Determine : det A(t), when A(t) is a differentiable matrix function of t. 


Miscellaneous Problems 


1. What are the possible eigenvalues of a linear operator T such that (a) 7’ = /, 
(b) T’ = 0, (c) T? — 5T + 6 = 0? 
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2. 


*4, 


it: 


12. 


oa) Li 


14. 


15. 


A linear operator T is called nilpotent if some power of T is zero. 

(a) Prove that 7 is nilpotent if and only if its characteristic polynomial is t”, n = dim V. 

(b) Prove that if T is a nilpotent operator on a vector space of dimension n, then T” = 0. 

(c) A linear operator T is called unipotent if T — 1 is nilpotent. Determine the character- 
istic polynomial of a unipotent operator. What are its possible eigenvalues? 


. Let A be an nm X n complex matrix. Prove that if trace A‘ = 0 for all i, then A is nilpotent. 


Let A, B be complex n X n matrices, and let c = AB — BA. Prove that if c commutes with 
A then Cc is nilpotent. 

Let Aj,...,An be the roots of the characteristic polynomial p(t) of a complex matrix A. 
Prove the formulas trace A = A; + «*+ + A, and det A = A, °** An. 


. Let T be a linear operator on a real vector space V such that T* = /. Define subspaces as 


follows: 
Wt = {vo EV|T(v) = v}, W = {vo EV| T(v) = -v}. 


Prove that V is isomorphic to the direct sum W*@W-. 


. The Frobenius norm |A| of an n X n matrix A is defined to be the length of A when it is 


considered as an n?-dimensional vector: |A|? = = |aj|’. Prove the following inequali- 
ties: |A + B| = |A| + |B| and |4B| s [4| |B. 


. Let T: V-—~V be a linear operator on a finite-dimensional vector space V. Prove that 


there is an integer n so that (ker 7”) NM (im T") = O. 


. Which infinite matrices represent linear operators on the space Z {Chapter 3 (5.2d)]? 
*10. 


The k X k minors of an m X n matrix A are the square submatrices obtained by crossing 

out m — k rows and n — k columns. Let A be a matrix of rank r. Prove that some r X r 

minor is invertible and that no (r + 1) X (r + 1) minor is invertible. 

Let g: F"——> F be left multiplication by an m X n matrix A. Prove that the following 

are equivalent. 

(a) A has a right inverse, a matrix B such that AB = /. 

(b) ¢ is surjective. 

(c) There is an m X m minor of A whose determinant is not zero. 

Let g: F"—— F”™ be left multiplication by an m X n matrix A. Prove that the following 

are equivalent. 

(a) A has a left inverse, a matrix B such that BA = /. 

(b) ¢ is injective. 

(c) There is an n X n minor of A whose determinant is not zero. 

Let A be an n X n matrix such that A’ = /. Prove that if A has only one eigenvalue @, then 

A= ZI. 

(a) Without using the characteristic polynomial, prove that a linear operator on a vector 
space of dimension n can have at most n different eigenvalues. 

(b) Use (a) to prove that a polynomial of degree n with coefficients in a field F has at 
most n roots in F. 

Let A be an n X n matrix, and let p(t) = t” + cn—-it”~' + +++ + e:t + co be its charac- 

teristic polynomial. The Cayley—Hamilton Theorem asserts that 


pl(A= A” + cp-1 A" ' + + + A + col = 0. 


(a) Prove the Cayley—Hamilton Theorem for 2 X 2 matrices. 
(b) Prove it for diagonal matrices. 
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16. 


a7. 


18. 


*19. 


*20. 


(c) Prove it for diagonalizable matrices. 
*(d) Show that every complex n X n matrix is arbitrarily close to a diagonalizable matrix, 
and use this fact to extend the proof for diagonalizable matrices to all complex ma- 
trices by continuity. 
(a) Use the Cayley-Hamilton Theorem to give an expression for A™' in terms of A, 

(det A)"', and the coefficients of the characteristic polynomial. 
(b) Verify this expression in the 2 X 2 case by direct computation. 
Let A be a 2 X 2 matrix. The Cayley-Hamilton Theorem allows all powers of A to be 
written as linear combinations of / and A. Therefore it is plausible that e“ is also such a 
linear combination. 
(a) Prove that if a, b are the eigenvalues of A and if a # b, then 

{oe ee ee 


= ————] + ———_A. 
A= Ie, a= ip 


(b) Find the correct formula for the case that A has two equal eigenvalues. 


The Fibonacci numbers 0,1,1,2,3,5,8,... are defined by the recursive relations 
fn = fn—-1 + fn—-2, with the initial conditions fo = 0, fi = |. This recursive relation can 


1 1 es [ n | 
1 f, 


Ma) Gh 
where a = V5. 


(b) Suppose that the sequence a, is defined by the relation an = $(an—1_+ @n—2). Com- 
pute lima, in terms of do, a1. 

Let A be an n X n real positive matrix, and let X € R” be a column vector. Let us use the 

shorthand notation X > 0 or X = 0 to mean that all entries of the vector X are positive or 

nonnegative, respectively. By “positive quadrant” we mean the set of vectors xX = 0. 

(But note that xX = 0 and X # 0 do not imply X > 0 in our sense.) 

(a) Prove that if X = 0 and xX # O then Ax > 0. 

(b) Let C denote the set of pairs (x,t), t ER, such that x = 0, |x| = 1, and 
(A - tl)X = 0. Prove that C is a compact set in R”*'. 

(c) The function ¢ takes on a maximum value on C, say at the point (Xo, fo). Then 
(A — tol)Xo = O. Prove that (A — to/)Xo = 0. 

(d) Prove that Xo is an eigenvector with eigenvalue t) by showing that otherwise the vec- 
tor AXo = X; would contradict the maximality of t. 

(e) Prove that fo is the eigenvalue of A with largest absolute value. 


Let A = A(t) be a matrix of functions. What goes wrong when you try to prove that, in 


analogy with n = 1, the matrix 
exo( | Aludu) 
to 


is a solution of the system dx/dt = AX? Can you find conditions on the matrix function 
A(t) which will make this a solution? 


(a) Prove the formula 
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Symmetry 


Lalgébre n'est qu'une géométrie écrite; 
la géométrie n'est qu'une algébre figurée. 


Sophie Germain 


The study of symmetry provides one of the most appealing applications of group the- 
ory. Groups were first invented to analyze symmetries of certain algebraic structures 
called field extensions, and because symmetry is a common phenomenon in all sci- 
ences, it is still one of the two main ways in which group theory is applied. The 
other way is through group representations, which will be discussed in Chapter 9. In 
the first four sections of this chapter, we will study the symmetry of plane figures in 
terms of groups of rigid motions of the plane. Plane figures provide a rich source of 
examples and a background for the general concept of group operation, which is in- 
troduced in Section 5. 

When studying symmetry, we will allow ourselves to use geometric reasoning 
without bothering to carry the arguments back to the axioms of geometry. That can 
be left for another occasion. 


I. SYMMETRY OF PLANE FIGURES 


The possible symmetry of plane figures is usually classified into the main types 
shown in Figures (1.1—}.3). 


K 


(1.1) Figure. Bilateral symmetry. 
7155 
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(1.2) Figure. Rotational symmetry. 


LLL 


(1.3) Figure. Translational symmetry. 


ee £9 


A fourth type of symmetry also exists, though it may be slightly less familiar: 


ISLS 


(1.4) Figure. Glide symmetry. 


Figures such as wallpaper patterns may have two independent translational 
symmetries, as shown in Figure (1.5): 


os 


aan 
<> a 
a 


(1.5) Figure. 


Other combinations of symmetries may also occur. For instance. the star has bilat- 


eral as weil as rotational symmetry. Figure (1.6) is an example in which translational 
and rotational symmetry are combined: 


(1.6) Figure. 


Another example is shown in Figure (1.7): 


(1.7) Figure. 


Section 2 The Group of Motions of the Plane 157 


As in Section 5 of Chapter 4, we call a map m: P——> P from the plane P to 
itself a rigid motion, or an isometry, if it is distance-preserving, that is, if for any 
two points p,q € P the distance from p to q is equal to the distance from m(p) to 
m(q). We will show in the next section that the rigid motions are translations, rota- 
tions, reflections, and glide reflections. They form a group M whose law of composi- 
tion is composition of functions. 

If a rigid motion m carries a subset F of the plane to itself, we call it a symme- 
try of F. The set of all symmetries of F always forms a subgroup G of M, called the 
group of symmetries of the figure. The fact that G is a subgroup is clear: If m and m' 
carry F to F, then so does the composed map mm’, and so on. 

The group of symmetries of the bilaterally symmetric Figure (1.1) consists of 
two elements: the identity transformation | and the reflection r about a line called 
the axis of symmetry. We have the relation rr = 1, which shows that G is a cyclic 
group of order 2, as it must be, because there is no other group of order 2. 

The group of symmetries of Figure (1.3) is an infinite cyclic group generated 
by the motion which carries it one unit to the left. We call such a motion a transla- 
tion t: 

Ga egt | solgutagt? Sect 


The symmetry groups of Figures (1.4, 1.6, 1.7) contain elements besides translations 
and are therefore larger. Do the exercise of describing their elements. 


2. THE GROUP OF MOTIONS OF THE PLANE 


This section describes the group M of all rigid motions of the plane. The coarsest 
classification of motions is into the orientation-preserving motions, those which do 
not flip the plane over, and the orientation-reversing motions which do flip it over 
(see Chapter 4, Section 5). We can use this partition of M to define a map 


M—> {+1}, 


by sending the orientation-preserving motions to | and the orientation-reversing 
motions to —1. You will convince yourself without difficulty that this map is a ho- 
momorphism: The product of two orientation-reversing motions is orientation- 


preserving, and so on. 
A finer classification of the motions is as follows: 


Qa) 


(a) The orientation-preserving motions: 
(i) Translation: parallel motion of the plane by a vector a: pw p+a. 
(ii) Rotation: rotates the plane by an angle @ # 0 about some point. 


(b) The orientation-reversing motions: 


(i) Reflection about a line €. 
(ii) Glide reflection: obtained by reflecting about a line €, and then translating 


by a nonzero vector a parallel to ¢. 
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(2.2) Theorem. The above list is complete. Every rigid motion is a translation, a 
rotation, a reflection, a glide reflection, or the identity. 


This theorem is remarkable. One consequence is that the composition of rotations 
about two different points is a rotation about a third point, unless it is a translation. 
This fact follows from the theorem, because the composition preserves orientation, 
but it is not obvious. 

Some of the other compositions are easier to visualize. The composition of ro- 
tations through angles 6 and 7 about the same point is again a rotation, through the 
angle 6 + 7, about that point. The composition of translations by the vectors a and 
b is the translation by their sum a + b. 

Note that a translation does not leave any point fixed (unless the vector a is 
zero, in which case it is the identity map). Glides do not have fixed points either. On 
the other hand, a rotation fixes exactly one point, the center of rotation, and a 
reflection fixes the points on the line of reflection. Hence the composition of 
reflections about two nonparallel lines €,, €2 is a rotation about the intersection point 
p = €, M €2. This follows from the theorem, because the composition does fix p, 
and it is orientation-preserving. The composition of two reflections about parallel 
lines is a translation by a vector orthogonal to the lines. 

In order to prove Theorem (2.2), and also to be able to compute conveniently 
in the group M, we are going to choose some special motions as generators for the 
yroup. We will obtain defining relations similar to the relations (1.18) in Chapter 2 
which define the symmetric group $3, but since M is infinite, there will be more of 
them. 

Let us identify the plane with the space R* of column vectors, by choosing a 
coordinate system. Having done this, we choose as generators the translations, the 
rotations about the origin, and the reflection about the x-axis: 


(2.3) 


+ 
(a) Translation ta by a vector a: tax) =x + a= ez a) 
X2 ar a2 


(b) Rotation pe by an angle 6 about the origin: 
ay = = 6 —sin | | 
ee sin@ cos @}| x2] 


(c) Reflection r about the x;-axis: r(x) = R 4 b = af 


X2 =.) 


Since they fix the origin, the rotations 9 and the reflection r are orthogonal opera- 
tors on R’. A translation is not a linear operator—it does not send zero to itself, ex- 
cept of course for translation by the zero vector. 

The motions (2.3) are not all of the elements of M. For example, rotation 
about a point other than the origin is not listed, nor are reflections about other lines. 
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However, they do generate the group: Every element of M is a product of such ele- 
ments. It is easily seen that any rigid motion m can be obtained by composing them. 
Either 


(2.4) m= tape orelse m = taper, 


for some vector a and angle 6, possibly zero. To see this, we recall that every rigid 
motion is the composition of an orthogonal operator followed by a translation 
[Chapter 4 (5.20)]. So we can write m in the form m = tam’, where m' is an or- 
thogonal operator. Next, if det m’ = 1, then it is one of the rotations p». This fol- 
lows from Theorem (5.5) of Chapter 4. So in this case, m = tape. Finally, if 
det m’ = —1, then det m'r = 1, so m'r is & rotation pe. Since r? = 1, m’ = por in 
this case, and m = taper. 

The expression of a motion m as a product (2.4) is unique. For suppose that m 
is expressed in two ways: m = taper’ = tepyr', where i, j are 0 or 1. Since m is 
orientation-preserving if i = 0 and orientation-reversing if i = 1, we must have 
i = j, and so we can cancel r from both sides if necessary, to obtain the equality 
tape = thPy. Multiplying both sides on the left by ¢-, and on the right by p-», we 
find ta-» = p,-9. But a translation is not a rotation unless both are the trivial opera- 
tions. Soa = band 6 = 7. 5 

Computation in M can be done with the symbols t., pe, r using rules for com- 
posing them which can be calculated from the formulas (2.3). The necessary rules 
are as follows: 


tat = tath, PePn = Pot+n; r= 1, 
Pota = ta'po, where a’ = pe(a), 

(2.5) rtq = tar, where a’ = r(a), 
rpe = p-or. 


Using these rules, we can reduce any product of our generators to one of the two 
forms (2.4). The form we get is uniquely determined, because there is only one ex- 
pression of the form (2.4) for a:given motion. 


Proof of Theorem (2.2). Let m be a rigid motion which preserves orientation but is 
not a cranslation. We want to prove that m is a rotation about some point. It is clear 
that an orientation-preserving motion which fixes a point p in the plane must be a ro- 
tation about p. So we must show that every orientation-preserving motion m which 
is not a translation fixes some point. We write m = tape as in (2.4). By assumption, 
6 # 0. One can use the geometric picture in Figure (2.6) to find the fixed point. In 
it, € is the line through the origin and perpendicular to a, and the sector with angle 
9 is situated so as to be bisected by €. The point p is determined by inserting the 
vector a into the sector, as shown. To check that m fixes p, remember that the oper- 
ation pe is the one which is made first, and is followed by ta. 
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0 
(2.6) Figure. The fixed point of an orientation-preserving motion. 


Another way to find the fixed point is by solving the equation x = tape(x) 
algebraically for x. By definition of a translation, ta(pe(x)) = pe(x) + a. So the 
equation we need to solve is 


x — po(x) =a or 


(2.7) el sin 6 12] _ “| 

-sin@ 1—cos 6}{ x2 a 
Note that det(1 — pe) = 2 — 2 cos 6. The determinant is not zero if 0 # 0, so 
there is a unique solution for x. 


(2.8) Corollary. The motion m = t,p¢ is the rotation through the angle @ about 
its fixed point. 


Proof. As we just saw, the fixed point of m is the one which satisfies the rela- 
tion p = pe(p) + a. Then for any x, 


m(p + x) = tape(p + x) = pop + x) + a = pop) + polx) + a = p + po(x). 


Thus m sends p + x to p + po(x). So it is the rotation about p through the angle @, 
as required. o 


Next, we will show that any orientation-reversing motion m = taper is a glide 
reflection or a reflection (which we may consider to be a glide reflection having glide 
vector zero). We do this by finding a line € which is sent to itself by m, and so that 
the motion of m on @ is a translation. It is clear geometrically that an orientation- 
reversing motion which acts in this way on a line is a glide reflection. 

The geometry is more complicated here, so we will reduce the problem in two 
steps. First, the motion per = r’ is a reflection about a line. The line is the one 
which intersects the x;-axis at an angle of $@ at the origin. This is not hard to see, 
geometrically or algebraically. So our motion m is the product of the translation t, 
and the reflection r’. We may as well rotate coordinates so that the x,-axis becomes 
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the line of reflection of r’. Then r’ becomes our standard reflection r, and the trans- 
lation fg remains a translation, though the coordinates of the vector a will have 
changed. In this new coordinate system, the motion is written as m = far, and it 


acts as 
a xX + a, 
m = ; 
X2 ae, ar a2 


This motion sends the line x. = 5a. to itself, by the translation (x,,}a@))'~w~ 
(x1 + @1,$a2)', and so m is a glide along this line. o 


There are two important subgroups of M for which we must introduce 
notation: 


(2.9) 


T, the group of translations. 
O, the group of orthogonal operators. 


The group O consists of the motions leaving the origin fixed. It contains the rotations 
about the origin and reflections about lines through the origin. 
Notice that with our choice of coordinates we get a bijective correspondence 


R’—> T 
(2.10) 
aw> ta. 
This is an isomorphism of the additive group (R?)” with the subgroup 7, because 
lalp = lato. 
The elements of O are linear operators. Agaiti making use of our choice of co- 
ordinates, we can associate an element m € O to its matrix. Doing so, we obtain an 
isomorphism 


0.—> O 
from the group O, of orthogonal 2 x 2 matrices to O [see Chapter 4 (5.16)]. 


We can also consider the subgroup of M of motions fixing a point of the plane 
other than the origin. This subgroup is related to O as follows: 


(2.11) Proposition. 


(a) Let p be a point of the plane. Let po’ denote rotation through the angle 6 about 
p. and let r’ denote reflection about the line through p and parallel to the 
x-axis. Then pe’ = tppetp'' and r’ = tprty’. 


(b) The subgroup of M of motions fixing p is the conjugate subgroup 
O' = t,0t,'. 
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Proof. We can obtain the rotation pe’ in this way: First translate p to the 
origin, next rotate the plane about the origin through the angle 6, and finally trans- 
late the origin back to p: 


pe’ = tppot-p = tppotp ' 
The reflection r’ can be obtained in the same way from r: 
rt’ = Iprt-p = Wily’. 


This proves (a). Since every motion fixing p has the form po’ or pe'r' [see the proof 
of (2.4)], (b) follows from (a). o 


There is an important homomorphism ¢ from M to O whose kernel is 7, which 
is obtained by dropping the translation from the products (2.4): 


M—>0 
(2.12) tapoe~~> po: 
taper ~~> por. 


This may look too naive to be a good definition, but formulas (2.5) show that ¢ is a 

homomorphism: (taPe)(toPn) = tate PePn = lato’ Po+n, hence P(tapetePn) = Pern, 

and so on. Since T is the kernel of a homomorphism, it is a normal] subgroup of M. 
Note that we can not define a homomorphism from M to T in this way. 


(2.13) Proposition. Let p be any point of the plane, and let pe’ denote rotation 
through the angle @ about p. Then ¢( pe’) = po. Similarly, if r’ is reflection about 
the line through p and parallel to the x-axis, then g(r’) = r. 


This follows from (2.11a), because tp is in the kernel of g. The proposition can 
also be expressed as follows: 


(2.14) | The homomorphism ¢ does noi depend on the choice of origin. o 


3. FINITE GROUPS OF MOTIONS 


In this section we investigate the possible finite groups of symmetry of figures such 
as (1.1) and (1.2). So we are led to the study of finite subgroups G of the group M 
of rigid motions of the plane. 

The key observation which allows us to describe all finite subgroups is the fol- 
lowing theorem. 


(3.1) Theorem. Fixed Point Theorem: Let G be a finite subgroup of the group of 
motions M. There is a point p in the plane which is left fixed by every element of G, 
that is, there is a point p such that g(p) = p for all g € G. 
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It follows, for example, that any subgroup of M which contains rotations about 
two different points is infinite. 


Here is a beautiful geometric proof of the theorem. Let s be any point in the 
plane, and let S be the set of points which are the images of s under the various mo- 
tions in G. So each element s' € S has the form s' = g(s) for some g € G. This 
set is called the orbit of s under the action of G. The element s is in the orbit because 
the identity element | is in G, and s = I(s). A typical orbit is depicted below, for 
the case that G is the group of symmetries of a regular pentagon. 


Any element of the group G will permute the orbit S. In other words, if 
s’ € Sand x € G, then x(s’) € S. For, say that s’ = g({s), with g © G. Since G 
is a group, xg € G. Therefore, by definition, xg(s) € S. Since xg(s) = x(s’), this 
shows that x(s’) € S. 

We list the elements of S arbitrarily, writing S = {s,,...,5,}. The fixed point 
we are looking for is the center of gravity of the orbit, defined as 


(3.2) p =4(s + + + sn), 


where the right side is computed by vector addition, using an arbitrary coordinate 
system in the plane. The center of gravity should be considered an average of the 
points $),..., Sn. 


(3.3) Lemma. Let S = {sy,..., 5} be a finite set of points of the plane, and let p 
be its center of gravity, defined by (3.2). Let m be a rigid motion, and let 
m(si) = si’ and m(p) = p’. Then p'’ = i(s;' + «++ + sy’). In other words, rigid 
motions carry centers of gravity to centers of gravity. 


Proof. This is clear by physical reasoning. It can also be shown by calcula- 
tion. To do so, it suffices to treat separately the cases m = ta, m = po, and m = r, 
since any motion is obtained from these by composition. 


Case 1: m = tg. Then p’ = p + aand-s;’ = 5; + a. It is true that 
p+a=j(( +a) + +--+ (sa +a). 
Case 2: m = poe or r. Then m is a linear operator. Therefore 


p’ = m(4(s1 + or + Sn) = 2(m(s1) + oe + m(sn)) = (51 +o + 5’). 
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The center of gravity of our set S is a fixed point for the action of G. For, any 
element g; of G permutes the orbit {s;,...,5n}, so Lemma (3.3) shows that it sends 
the center of gravity to itself. This completes the proof of the theorem. o 


Now let G be a finite subgroup of M. Theorem (3.1) tells us that there is a 
point fixed by every element of G, and we may adjust coordinates so that this point 
is the origin. Then G will be a subgroup of O. So to describe the finite subgroups G 
of M, we need only describe the finite subgroups of O (or, since O is isomorphic to 
the group of orthogonal 2 x 2 matrices, the finite subgroups of the orthogonal group 
O,). These subgroups are described in the following theorem. 


(3.4) Theorem. Let G be a finite subgroup of the group O of rigid motions which 
fix the origin. Then G is one of the following groups: 


(a) G =C,: the cyclic group of order n, generated by the rotation pg, where 
6 = 2z1/n. 

(b) G = D,: the dihedral group of order 2n, generated by two elements—the ro- 
tation pe, where 9 = 27/n, and a reflection r’ about a line through the origin. 


The proof of this theorem is at the end of the section. 


The group D, depends on the line of reflection, but of course we may choose 
coordinates so that it becomes the x-axis, and then r’ becomes our standard 
reflection r. If G were given as a finite subgroup of M, we would also need to shift 
the origin to the fixed point in order to apply Theorem (3.4). So our end resu!t about 
finite groups of motions is the following corollary: 


(3.5) Corollary. Let G be a finite subgroup of the group of motions M. If coordi- 
nates are introduced suitably, then G becomes one of the groups C, or D,, where C, 
is generated by poe, 8 = 27/n , and D, is generated by pe and r. o 


When n 2 3 , the dihedral group D, is the group of symmetries of a regular 
n-sided polygon. This is easy to see, and in fact it follows from the theorem. For a 
regular n-gon has a group of symmetries which contains the rotation by 277/n about 
its center. It also contains some reflections. Theorem (3.4) tells us that it is D,. 

The dihedral groups D,, D2 are too small to be symmetry groups of an n-gon in 
the usual sense. D, is the group {1,7} of two elements. So it is a cyclic group, as is 
C2. But the nontrivial element of D, is a reflection, while in C; it is rotation through © 
the angle 77. The group D2 contains the four elements {1,p,7r, pr} , where p = pr. 
It is isomorphic to the Klein four group. If we like, we can think of D, and D, as 
groups of symmetry of the 1-gon and 2-gon: 


So 


1-gon. 2-gon. 
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The dihedral groups are important examples, and it will be useful to have a 
complete set of defining relations for them. They can be read off from the list of 
defining relations for M (2.5). Let us denote the rotation p» (8 = 2a/n) by x, and 
the reflection r by y. 


(3.6) Proposition. The dihedral group D, is generated by two elements x, y which 
satisfy the relations 

c= 1, 3° =) == y. 
The elements of D, are 


(eee yey. x’y,...,x° yp ax'yw |0=7 <n, 0 S47 < 2}. 


Proof. The elements x = pg and y = r generate D, by definition of the group. 
The relations y* = | and yx = x 'y are included in the list of relations (2.5) for M: 
They are vr = | and rpe = p-er. The relation x" = 1 follows from the fact that 
@ = 22/n , which also shows that the elements 1, x,...,x” ' are distinct. It follows 
that the elements y, xy, x°y,...,x” 'y are also distinct and, since they are reflections 
while the powers of x are rotations, that there is no repetition in the list of elements. 
Finally, the relations can be used to reduce any product of x, y,x"', y'' to the form 
x'y) , with O <i <n, 0 j < 2. Therefore the list contains all elements of the 
group generated by x,y , and since these elements generate D, the list is complete. o 


Using the first two relations (3.6), the third relation can be written in various 
ways. It is equivalent to 


(3.7) yx = x”""'y and also to xyxy = 1. 


Note that when n = 3, the relations are the same as for the symmetric group $3 
[Chapter 2(1.18)]. 


(3.8) Corollary. The dihedral group D, and the symmetric group $3 are isomor- 
phic. o 


For n > 3, the dihedral and symmetric groups are certainly not isomorphic, because 
Dy, has order 2n, while S, has order n!. 


Proof of Theorem (3.4). Let G be a finite subgroup of O. We need to remem- 
ber that the elements of O are the rotations po and the reflections per. 


Case 1: All elements of G are rotations. We must prove that G is cyclic in this case. 
The proof is similar to the determination of the subgroups of the additive group Z* 
of integers [Chapter 2 (2.3)]. If G = {I}, then G = C,. Otherwise G contains a 
nontrivial rotation pe. Let @ be the smallest positive angle of rotation among the ele- 
ments of G. Then G is generated by pe. For let pa be any element of G, where the 
angle of rotation a is represented as usual by a real number. Let n6 be the greatest 
integer multiple of @ which is less than a, so that a = nO + B, withO=B <@. 
Since G is a group and since Pa and pe are in G, the product pg = Pap -no is also in 
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G. But by assumption 6 is the smallest positive angle of rotation in G. Therefore 
@ = Oand a = n6. This shows that G is cyclic. Let n@ be the smallest multiple of 6 
which is =2z7r, so that 277 < nO < 2m + 0. Since @ is the smallest positive angle of 
rotation in G. nO = 27. Thus 6 = 27r/n for some integer n. 


Case 2: G contains a reflection. Adjusting coordinates as necessary, we may assume 
that our standard reflection r is in G. Let H denote the subgroup of rotations in G. 
We can apply what has been proved in Case | to the group H, to conclude that it is a 
cyclic group: H = Cy. Then the 2n products pe’, per, OS i =n — 1, are in G, 
and so G contains the dihedral group D,. We must show that G = D,. Now if an 
element g of G is a rotation, then g © H by definition of H; hence g is one of the 
elements of D,,. If g is a reflection, we can write it in the form par for some rotation 
Pa (2.8). Since r is in G, so is the product parr = pa. Therefore pa is a power of 
pe, and g is in D, too. SoG = D,,. This completes the proof of the theorem. o 


4, DISCRETE GROUPS OF MOTIONS 


In this section we will discuss the symmetry groups of unbounded figures such as 
wallpaper patterns. Our first task is to describe a substitute for the condition that the 
group is finite—one which includes the groups of symmetry of interesting un- 
bounded figures. Now one property which the patterns illustrated in the text have is 
that they do not admit arbitrarily small translations or rotations. Very special figures 
such as a line have arbitrarily small translational symmetries, and a circle, for exam- 
ple, has arbitrarily small rotational symmetries. It turns out that if such figures are 
ruled out, then the groups of symmetry can be classified. 


(4.1) Definition. A subgroup G of the group of motions M is-called discrete if it 
does not contain arbitrarily small translations or rotations. More precisely, G is dis- 
crete if there is some real number € > 0 so that 


(i) if fa is a translation in G by a nonzero vector a, then the length of a is at least 
e:|a| =e; 

(il) if p is a rotation in G about some point through a nonzero angle 6, then the 
angle 6 is at least €: |6| = e. 


Since the translations and rotations are all the orientation-preserving motions (2.1), 
this condition applies to all orientation-preserving elements of G. We do not impose 
a condition on the reflections and glides. The one we might ask for follows automat- 
ically from the condition imposed on orientation-preserving motions. 

The kaleidoscope principle can be used to show that every discrete group of 
motions is the group of symmetries of a plane figure. We are not going to give pre- 
cise reasoning to show this, but the method can be made into a proof. Start with a 
sufficiently random figure R in the plane. We require in particular that R shall not 
have any symmetries except for the identity. So every element g of our group will 
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move R to a different position. call it gR. The required figure F is the union of all 
the figures gR. An element x of G sends gR to xgR, which is also a part of F, and 
hence it sends F to itself. If R is sufficiently random, G will be its group of sym- 
metries. As we know trom the kaleidoscope, the figure F is often very attractive. 
Here is the result of applying this procedure in the case that G is the dihedral group 
of symmetries of a regular pentagon: 


. 


Of course many figures have the same group or have similar groups of symme- 
try. But nevertheless it is interesting and instructive to classify figures according to 
their groups of symmetry. We are going to discuss a rough classification of the 
groups, which will be refined in the exercises. 

The two main tools for studying a discrete group G are its translation group 
and its point group. The translation group of G is the set of vectors a such that 
ta © G. Since tutp = tu+p and ft a = t, ', this is a subgroup of the additive group of 
vectors, which we will denote by Lc. Using our choice of coordinates, we identify 
the space of vectors with R*. Then 


(4.2) Le = {a € R? | t, € G}. 


This group is isomorphic to the subgroup 7 | G of translations in G, by the isomor- 
phism (2.10): a~~~> ta. Since it is a subgroup of G, T M G is discrete: A subgroup 
of a discrete group is discrete. If we translate this condition over to Lg, we find 


(4.3) Le contains no vector of length < €, except for the zero vector. 


A subgroup L of R”* which satisfies condition (4.3) for some € > 0 is called a 
discrete subgroup of R”. Here the adjective discrete means that the elements of L 
are separated by a fixed distance: 


(4.4) The distance between any two vectors a,b € L is at least €, ifa # b. 
For the distance is the length of b — a, and b — a € L because L is a subgroup. 


(4.5) Proposition. Every discrete subgroup L of R’ has one of these forms: 


(a) L = {0}. 
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(b) L is generated as an additive group by one nonzero vector a: 
L = {ma|m € Z. 
(c) L is generated by two linearly independent vectors a, b: 
L = {ma + nb| m,n € Z}. 


Groups of the third type are called plane lattices, and the generating set (a, b) is 
called a lattice basis. 0 


(4.6) Figure. A lattice in R’. 


We defer the proof of Proposition (4.5) and turn to the second tool for studying 
a discrete group of motions—its point group. Recall that there is a homomorphism 
(2.13) g¢: M——> 0, whose kernel is T. If we restrict this homomorphism to G, we 
obtain a homomorphism 


(4.7) olc: G—> 0. 


Its kernel is T M G (which is a subgroup isomorphic to the translation group Lc). 
The point group G is the image of G in O. Thus G is a subgroup of O. 

By definition, a rotation pe is in G if G contains some element of the form 
tape. And we have seen (2.8) that tape is a rotation through the angle @ about some 
point in the plane. So the inverse image of an element pe € G consists of all of the 
elements of G which are rotations through the angle @ about some point. 

Similarly, let € shige the line of reflection of per. As we have noted before, 
its angle with the x-axis is $@. The point group G contains per if there is some ele- 
ment taper in G, and taper is a reflection or a glide reflection along a line parallel to 
€. So the inverse image of per consists of all elements of G which are reflections and 
glides along lines parallel to ¢. 

Since G contains no small rotations, the same is true of its point group G. So 
G is discrete too—it is a discrete subgroup of O. 
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(4.8) Proposition. A discrete subgroup of O is a finite group. 


We leave the proof of this proposition as an exercise. o 
Combining Proposition (4.8) with Theorem (3.4), we find the following: 
(4.9) Corollary. The point group G of a discrete group G is cyclic or dihedral. o 


Here is the key observation which relates the point group to the translation 
group: 


(4.10) Proposition. Let G be a discrete subgroup of M, with translation group 
L = Le and point group G. The elements of G carry the group L to itself. In other 
words, if g © G anda € L, then g(a) € L. 


We may restate this proposition by saying that G is contained in the group of 
symmetries of L, when L is regarded as a set of points in the plane R?. However, it 
is important to note that the original group G need not operate on L. 


Proof. To say that a € L means that t, € G. So we have to show that if 
ta © G and g € G, then tza) © G. Now by definition of the point group, g is the 
image of some element g of the group G: ¢(g) = g. We will prove the proposition 
by showing that f(a) is the conjugate of tg by g. We write g = tp or thpr, where 
p = po. Then g = por pr, according to the case. In the first case, 

Stag! = toptap 't-b = tetp(a)pp 't-» = tla), 
as required. The computation is similar in the other case. o 


The following proposition describes the point groups which can arise when the 
translation group Lg is a lattice. 


(4.11) Proposition. Let H C O be a finite subgroup of the group of symmetries of a 
lattice L. Then 


(a) Every rotation in H has order 1, 2, 3, 4, or 6. 
(b) H is one of the groups Cn, Dn where n = 1, 2, 3, 4, or 6.0 


This proposition is often referred to as the Crystallographic Restriction. Notice that a 
rotation of order 5 is ruled out by (4.11). There is no wallpaper pattern with fivefold 
rotational symmetry. (However, there do exist “quasi-periodic” patterns with 


fivefold symmetry.) 
To prove Propositions (4.5) and (4.11), we begin by noting the following sim- 


ple lemma: 
(4.12) Lemma. Let L be a discrete subgroup of R?. 


(a) A bounded subset S of R’ contains only finitely many elements of L. 
(b) If L # {0}, then L contains a nonzero vector of minimal length. 
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Proof. 


(a) Recall that a subset S of R” is called bounded if it is contained in some large 
box, or if the points of S$ do not have arbitrarily large coordinates. Obviously, if S 
is bounded, so is L M S. Now a bounded set which is infinite must contain some 
elements arbitrarily close to each other—that is, the elements can not be separated 
by a fixed positive distance €. This is not the case for L, by (4.4). Thus LM S is 
finite. 
(b) When we say that a nonzero vector a has minimal length, we mean that every 
nonzero vector v € L has length at least |a|. We don’t require the vector a to be 
uniquely determined. In fact we couldn’t require this, because whenever a has min- 
imal length, —a does too. 

Assume that L # {0}. To prove that a vector of minimal length exists, we let 
b € L be any nonzero vector, and let S be the disc of radius | b| about the origin. 
This disc is a bounded set, so it contains finitely many elements of L, including b. 
We search through the nonzero vectors in this finite set to find one having minimal 
length. It will be the required shortest vector. o 


Proof of Proposition (4.11). The second part of the proposition follows from 
the first, by (3.6). To prove (a), let 6 be the smallest nonzero angle of rotation in H, 
and let a be a nonzero vector in L of minimal length. Then since H operates on L, 
poe(a) is also in L; hence b = pe(a) — a € L. Since a has a minimal length, 
|b| = |a|. It follows that 6 = 27/6. 


Po (a) 


o— a 


Thus pg has order <6. The case that @ = 277/5 is also ruled out, because then the 
element b’ = po¢’(a) + a is shorter than a: 


Pe (a) 


pla) 


a 


This completes the proof. o 


Proof of Proposition (4.5). Let L be a discrete subgroup of R?. The possibility 
that L = {0} is included in the list. If L # {0}, there is a nonzero vector a € L, and 
we have two possibilities: 
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Case |: All vectors in L lie on one line € through the origin. We repeat an argument 
used several times before, choosing a nonzero vector a € L of minimal length. We 
claim that L is generated by a. Let v be any element of L. Then it is a real multiple 
v = ra ofa, since L C €. Take out the integer part of r, writing r = n + ro, where 
n is an integer and 0 = ro < 1. Then v — na = roa has length less than a, and 
since L is a group this element is in L. Therefore ro = 0. This shows that v is an in- 
teger multiple of a, and hence that it is in the subgroup generated by a, as required. 


Case 2: The elements of L do not lie on a line. Then L contains two linearly inde- 
pendent vectors a',b’. We start with an arbitrary pair of independent vectors, and 
we try to replace them by vectors which will generate the group L. To begin with, 
we replace a’ by a shortest nonzero vector a on the line € which a’ spans. When 
this is done, the discussion of Case 1 shows that the subgroup € M L is generated by 
a. Next, consider the parallelogram P’ whose vertices are 0,a,b',a + b’: 


(4.13) Figure. 


Since P’ is a bounded set, it contains only finitely many elements of L (4.12). We 
may search through this finite set and choose a vector b whose distance to the line ¢ 
is as small as possible, but positive. We replace b’ by this vector. Let P be the paral- 
lelogram with 0,a,b,a + b. We note that P contains no points of L except for its 
vertices. To see this, notice first that any lattice point c in P which is not a vertex 
must lie on one of the line segments [b, a + b] or [0, a]. Otherwise the two points c 
and c — a would be closer to / than b, and one of these points would lie in P’. Next, 
the line segment [0, a] is ruled out by the fact that a is a shortest vector on €. Fi- 
nally, if there were a point c on [b,a + b], then c — b would be an element of L on 
the segment [0, a]. The proof is completed by the following lemma. 


(4.14) Lemma. Let a, b be linearly independent vectors which are elements of a 
subgroup L of R?. Suppose that the parallelogram P which they span contains no ele- 
ment of L other than the vertices 0,a,b,a + b. Then L is generated by a and 5, that 
is, 

L = {ma + nb| m,n € Z}. 


Proof. Let v be an arbitrary element of L. Then since (a,b) is a basis of R’, v 
is a linear combination, say v = ra + sb, where r,s are real numbers. We take out 
the integer parts of r,s, writing r = m + ro, Ss = n + So, where m,n are integers 
and 0 < ro, So < 1. Let vo = roa + Sob = v — ma — nb. Then tp lies in the paral- 
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lelogram P, and vo € L. Hence vo is one of the vertices, and since ro, So < 1, it 
must be the origin. Thus v = ma + nb. This completes the proof of the lemma and 
of Proposition (4.5). 5 


Let L be a lattice in R’. An element v € L is called primitive if it is not 
an integer multiple of another vector in L. The preceding proof actually shows the 
following: 


(4.15) Corollary. Let L be a lattice, and let v be a primitive element of L. There 
is an element w € L so that the set (v, w) is a lattice basis. o 


Now let us go back to our discrete group of motions G C M and consider the 
rough classification of G according to the structure of its translation group La. If Le 
is the trivial group, then the homomorphism from G to its point group is bijective 
and G is finite. We examined this case in Section 3. 

The discrete groups G such that Lg is infinite cyclic are the symmetry groups 
of frieze patterns such as (1.3). The classification of these groups is left as an 
exercise. 

If Le is a lattice, then G is called a two-dimensional crystallographic group, or 
a lattice group. These groups are the groups of symmetries of wallpaper patterns and 
of two-dimensional crystals. 

The fact that any wallpaper pattern repeats itself in two different directions is 
reflected in the fact that its group of symmetries will always contain two independent 
translations, which shows that Lg is a lattice. It may also contain further elements— 
rotations, reflections, or glides—but the crystallographic restriction limits the possi- 
bilities and allows one to classify crystallographic groups into 17 types. The clas- 
sification takes into account not only the intrinsic structure of the group, but also the 
type of motion that each group element represents. Representative patterns with the 
various types of symmetry are illustrated in Figure (4.16). 

Proposition (4.11) is useful for determining the point group of a crystallo- 
graphic group. For example, the brick pattern shown below has a rotational symme- 
try through the angle zr about the centers of the bricks. All of these rotations repre- 
sent the same element p,, of the point group G. The pattern also has glide symmetry 
along the dotted line indicated. Therefore the point group G contains a reflection. 
By Proposition (4.11), G is a dihedral group. On the other hand, it is easy to see that 
the only nontrivial rotations in the group G of symmetries are through tiie angle 77. 
Therefore G = D2 = {1,px,1r, par}. 
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The point group G and the translation group Le do not completely characterize 
the group G. Things are complicated by the fact that a reflection in G need not be 
the image of a reflection in G—it may be represented in G only by glides, as in the 
brick pattern illustrated above. 

As a sample of the methods required to classify the two-dimensional crystallo- 
graphic groups, we will describe those whose point group contains a rotation p 
through the angle 77/2. According to Proposition (4.11), the point group will be ei- 
ther C; or D,. Since any element of G which represents p is also a rotation through 
ar/2 about some point p, we may choose p to be the origin. Then p can be thought 
of as an element of G too. 


(4.17) Proposition. Let G be a lattice group whose point group contains a rota- 
tion p through the angle 7/2. Choose coordinates so that the origin is a point of ro- 
tation by 7/2 in G. Let a be a shortest vector in L = Lg, let b = p(a), and let 
c = 3(a + b). Denote by r the reflection about the line spanned by a. Then G is 
generated by one of the following sets: {t2,p}, {ta,p.r}, {ta,P,tcr}. Thus there are 
three such groups. 


Proof. We first note that L is a square lattice, generated by a and b. For, a is 
in L by hypothesis, and Proposition (4.10) asserts that b = p(a) is also in L. These 
two vectors generate a square sublattice L’ of L. If L # L’, then according to 
Lemma (4.14) there is an element w € L in the square whose vertices are 
0,a,a + b and which is not one of the vertices. But any such vector would be at a 
distance less than |a| from at least one of the vertices v, and the difference w — tv 
would be in L but shorter than a, contrary to the choice of a. Thus L = L’, as 
claimed. 

Now the elements fa and p are in G, and ptap ' = tp (2.5). So the subgroup H 
of G generated by the set {ta,p} contains tg and ¢,. Hence it contains t, for every 
w € L. The elements of this group are the products typ‘: 


H ={n,p'|w EL, 0 =7 = 3}. 


This is one of.our groups. We now consider the possible additional elements which 
G may contain. 


Case |: Every element of G preserves orientation. In this case, the point group is 
Cz. Every element of G has the form m = t,p9, and if such an element is in G then 
pe is in the point group. So pe = p' for some i, and mp = t, € G too. Therefore 
u © L, andm € H. SoG = H in this case. 

Case 2: G contains an orientation-reversing motion. In this case the point group is 
D,, and it contains the reflection about the line spanned by a. We choose coordi- 
nates so that this reflection becomes our standard reflection r. Then r will be repre- 
sented in G by an element of the form m = t,r. 


Case 2a: The element u is in L; that is, 4, € G. Then r € G too, so G contains its 
point group G = Dg. If m' = type or if typer is any element of G, then per is in G 


Arty Alzebra 
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too, hence 4, € G, and w € L. Therefore G is the group generated by the set 
tigaomr\. 
Case 2b: The element u is not in L. This is the hard case. 


(4.18) Lemma. Let U be the set of vectors u such that tzr © G. Then 


(a) L+U =U. 
(b) pU = U. 
(jeu TU C-L. 


Proof. lf v © L and u EU, then t, and t,r are in G; hence ftytyr = 
te+ur © G. This shows that c + v € U and proves (a). Next, suppose that u € U. 
Then pturp = tpuprp = tour © G. This shows that pu € U and proves (b). 
Finally, if u,v © U, then tyrtor = tuire E G; hence u+rvo EL, which 
proves (c). a 


Part (a) of the lemma allows us to choose an element u € U lying in the 
square whose vertices are 0,a,b,a + b and which is not on the line segments 
[a,a + b] and[b,a + b]. We write u in terms of the basis (a, b), say u = xa + yb, 
where 0 S x,y < 1. Thenuw + ru = 2xa. Since u + ru € L by (4.18c), the possi- 
ble values for x are 0,4. Next, pu + a = (1 — y)a + xb lies in the square too, and 
the same reasoning shows that y is 0 or }. Thus the three possibilities for u are $a, 
tb, and 4(a + b) =c. But if wu = 4a, then pu =34b, and ru = u =a. So 
c = 3(a + b) E L (4.18b,c). This is impossible because c is shorter than a. Simi- 
larly, the case u = 3b is impossible. So the only remaining case is u = c, which 
means that the group G is generated by {ta,p, ter}. a 


5. ABSTRACT SYMMETRY: GROUP OPERATIONS 


The concept of symmetry may be applied to things other than geometric figures. For 
example, complex conjugation (a + bi)»~~~(a — bi) may be thought of as a sym- 
metry of the complex numbers. It is compatible with most of the structure of C: If @ 
denotes the complex conjugate of a, then a + B = @ + B and af = @B. Being 
compatible with addition and multiplication, conjugation is called an automorphism 
of the field C. Of course, this symmetry is just the bilateral symmetry of the com- 
plex plane about the real axis, but the statement that it is an automorphism refers to 
its algebraic structure. 

Another example of abstract “bilateral” symmetry is given by a cyclic group H 
of order 3. We saw in Section 3 of Chapter 2 that this group has an automorphism g, 
which interchanges the two elements different from the identity. 

The set of automorphisms of a group H (or of any other mathematical structure 
H) forms a group Aut H, the law of composition being composition of maps. Each 
automorphism should be thought of as a symmetry of H, in the sense that it is a per- 
mutatfon of the elements of H which is compatible with the structure of H. But in- 
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stead of being a geometric figure with a rigid shape, the structure in this case is the 
group law. The group of automorphisms of the cyclic group of order 3 contains two 
elements: the identity map and the map ¢. 

So the words automorphism and symmetry are more or less synonymous, ex- 
cept that automorphism is used to describe a permutation of a set which preserves 
some algebraic structure, while symmetry often refers to a permutation which pre- 
serves a geometric structure. 

These examples are special cases of a more general concept, that of an opera- 
tion of a group on a set. Suppose we are given a group G and a set S. An operation 
of G on S is arule for combining elements g © G ands € S to get an element gs of 
S. In other words, it is a law of composition, a map G x S——> S, which we gener- 
ally write as multiplication: 


2, Sw> gS. 


This rule is required to satisfy the following axioms: 
(5.1) 


(a) ls = s for all s (1 is the identity of G). 
(b) Associative law: (gg')s = g(g's), forall g,g’ € Gands ES. 


A set S with an operation of G is often called a G-set. This should really be 
called a left operation, because elements of G multiply on the left. 

Examples of this concept can be found manywhere. For example, let G = M 
be the group of all rigid motions of the plane. Then M operates on the set of points 
of the plane, on the set of lines in the plane, on the set of triangles in the plane, and 
so on. Or let G be the cyclic group {1,r} of order 2, with r° = 1. Then G operates 
on the set S of complex numbers, by the rule ra = @. The fact that the axioms (5.1) 
hold in a given example is usually clear. 

The reason that such a law of composition is called an operation is this: If we 
fix an element g of G but let s © S vary, then /eft multiplication by g defines a map 
from S to itself; let us denote this map by m,. Thus 


(5.2) Me: S—>S 
is defined by 
mg(s) = gs. 


This map describes the way the element g operates on S. Note that mg is a permuta- 
tion of S; that is, it is bijective. For the axioms show that it has the two-sided inverse 


m, 1 = multiplication by g°': 


m, i(my(s)) = g '(g(s)) = (g ‘g)s = 1s = s. Interchanging the roles of g and g ' 
shows that m,(m,-1(5)) = 5 too. 
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The main thing that we can do to study a set § on which a group G operates is 
to decompose the set into orbits. Let s be an element of S. The orbit of s in S is the 
set 


(5.3) O; = {s' © S| s' € gs for some g E G}. 


It is a subset of S. (The orbit is often written as Gs = {gs | g © G}, in analogy with 
the notation for cosets [Chapter 2 (6.1)]. We won’t do this because Gs looks too 
much like the notation for a stabilizer which we are about to introduce.) If we think 
of elements of G as operating on S by permutations, then Q, is the set of images of s 
under the various permutations m,. Thus, if G = M is the group of motions and S is 
the set of triangles in the plane, the orbit Oa of a given triangle A is the set of all 
triangles congruent to A. Another example of orbit was introduced when we proved 
the existence of a fixed point for the operation of a finite group on the plane (3.1). 
The orbits for a group action are equivalence classes for the relation 


(5.4) Ss it's — os iormsome ¢ GG. 


The proof that this is an equivalence relation is easy, so we omit it, we made a simi- 
lar verification when we introduced cosets in Section 6 of Chapter 2. Being equiva- 
lence classes, the orbits partition the set S: 


(5.5) S is a union of disjoint orbits. 


The group G operates on S by operating independently on each orbit. In other words, 
an element g € G permutes the elements of each orbit and does not carry elements 
of one orbit to another orbit. For example, the set of triangles of the plane can be 
partitioned into congruence classes, the orbits for the action of M. A motion m per- 
mutes each congruence class separately. Note that the orbits of an element s and of 
gs are equal. 

If S consists of just one orbit, we say that G operates transitively on S. This 
means that every element of S is carried to every other one by some element of the 
group. Thus the group of symmetries of Figure (1.7) operates transitively on the set 
of its legs. The group M of rigid motions of the plane operates transitively on the set 
of points of the plane, and it operates transitively on the set of lines in the plane. It 
does not operate transitively on the set of triangles in the plane. 

The stabilizer of an element s € S is the subgroup G; of G of elements leaving 
S fixed: 


(5.6) G,={zg EG | ee 


It is clear that this is a subgroup. Just as the kernel of a group homomorphism 
y: G——G' tells us when two elements x, y © G have the same image, namely, if 
x 'y € ker g [Chapter 2 (5.13)], we can describe when two elements x, y € G act 
in the same way on an element s € S in terms of the stabilizer G;: 
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(5.7) xs = ys if and only ifx''y & Gs. 


For xs = ys implies s = x 'ys, and conversely. 

As an example of a nontrivial stabilizer, consider the action of the group M of 
rigid motions on the set of points of the plane. The stabilizer of the origin is the sub- 
group O of orthogonal operators. 

Or, if S is the set of triangles in the plane and A is a particular triangle which 
happens to be equilateral, then the stabilizer of A is its group of symmetries, a sub- 
group of M isomorphic to D, (see (3.4)). Note that when we say that a motion m sta- 
bilizes a triangle A, we don’t mean that m fixes the points of A. The only motion 
which fixes every point of a triangle is the identity. We mean that in permuting the 
set of triangles, the motion carries A to itself. It is important to be clear about this 
distinction. 


6. THE OPERATION ON COSETS 


Let H be a subgroup of a group G. We saw in Section 6 of Chapter 2 that the left 
cosets aH = {ah|h € H} form a partition of the group [Chapter 2 (6.3)]. We will 
call the set of left cosets the coset space and will often denote it by G/H, copying 
this notation from that used for quotient groups when the subgroup is normal. 

The fundamental observation to be made is this: Though G/H is not a group 
unless the subgroup H is normal, nevertheless G operates on the coset space G/H in 
a natural way. The operation is quite obvious: Let g be an element of the group, and 
let C be a coset. Then gC is defined to be the coset 


(6.1) gC = {gc|c € Ch. 


Thus if C = aH, then gC is the coset gaH. It is clear that the axioms (5.1) for an 
operation are satisfied. 

Note that the group G operates transitively on G/H, because G/H is the orbit 
of the coset 1H = H. The stabilizer of the coset 1H is the subgroup H C G. Again, 
note the distinction: Multiplication by an element h € H does not act trivially on the 
elements of the coset 1H, but it sends that coset to itself. 

To understand the operation on cosets, you should work carefully through the 
following example. Let G be the group D; of symmetries of an equilateral triangle. 
As in (3.6), it may be described by generators x,y satisfying the relations x7 = 1, 
y? = 1, yx = x*y. Let H = {1, y}. This is a subgroup of order 2. Its cosets are 


(6.2) Ci = H = {i,y}, Co = {x, xy}, Cs = {x?, xy}, 


and G operates on G/H = {C,,C2,Cs}. So, as in (5.2), every element g of G deter- 
mines a permutation m, of {C;,C2,C3}. The elements x, y operate as 
—~ 


(6.3) iy 1 * and my:1 2 3. 
n_ 3 RY 
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In fact, the six elements of G yield all six permutations of three elements, and so the 
map 


G— S$; ~ Perm(G/H ) 


is an isomorphism. Thus the dihedral group G = D; is isomorphic to the symmetric 
group $3. We already knew this. 

The following proposition relates an arbitrary group operation to the operation 
on cosets: 


(6.4) Proposition. Let S be a G-set, and let s be an element of S. Let H be the 
stabilizer of s, and let O, be the orbit of s. There is a natural bijective map 


(@ iS. 
defined by 
aH ww as. 


This map is compatible with the operations of G in the sense that p(gC) = ge(C) 
for every coset C and every element g € G. 


The proposition tells us that every group operation can be described in terms of 
the operations on cosets. For example, let S = {v,, v2, vs} be the set of vertices of an 
equilateral triangle, and let G be the group of its symmetries, presented as above. 
The element y is a reflection which stabilizes one of the vertices of the triangle, say 
v,. The stabilizer of this vertex is H = {1, y}, and its orbit is S. With suitable index- 
ing, the set (6.2) of cosets maps to S by the map C; ~~» 0;. 


Proof of Proposition (6.4). {tis clear the map ¢, if it exists, will be compatible 
with the operation of the group. What is not so clear is that the rule gH ~~» gs 
defines a map at all. Since many symbols gH represent the same coset, we must 
show that if a and } are group elements and if aH = bH, then as = bs too. This is 
true, because we know that aH = bH if and only if b = ah for some h in H 
[Chapter 2 (6.5)]. And when b = ah, then bs = ahs = as, because h fixes s. Next, 
the orbit of s consists of the elements gs, and ¢ carries gH to gs. Thus g maps G/H 
onto O;, and ¢ is surjective. Finally, we show that ¢ is injective. Suppose aH and 
bH have the same image: as = bs. Then s = a™'bs. Since H was defined to be the 
stabilizer of s, this implies that a 'b = h © H. Thus b = ah € aH, and so 
aH = DH. This completes the proof. o 


(6.5) Proposition. Let S be a G-set, and let s € S. Let s’ be an element in the 
orbit of s, say s' = as. Then 


(a) The set of elements g of G such that gs = s' is the left coset 
aG; = {g © G| g = ah for some h € G;}. 
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(b) The stabilizer of s’ is a conjugate subgroup of the stabilizer of s: 
Gy = aG;a"' = {g © G| g = aha"' for some h € Gs}. 
We omit the proof. o 
As an example, let us recompute the stabilizer of a point p in the plane, for the 
operation of the group of motions. We have made this computation before, in 


(2.11b). We have p = t,(0), and the stabilizer of the origin is the orthogonal group 
O. Thus by (6.5b), 


We know on the other hand that G, consists of rotations and reflections about the 
point p. Those are the motions fixing p. So t,Ot,"' consists of these elements. This 
agrees with (2.11). 


7. THE COUNTING FORMULA 


Let H be a subgroup of G. As we know from Chapter 2 (6.9), all the cosets of H in 
G have the same number of elements: | H| = |aH |. Since G is a union of nonover- 
lapping cosets and the number of cosets is the index, which we write as [G:H] or 
|G/H|, we have the fundamental formula for the order |G| of the group G (see 
[Chapter 2 (6.10)]): 

(7.1) |G| = |H||G/H]. 


Now let S be a G-set. Then we can combine Proposition (6.4) with (7.1) to get 
the following: 


(7.2) Proposition. Counting Formula: Let s € S. Then 


(order of G) = (order of stabilizer)(order of orbit) 


|G| = |G.||O.]. 
Equivalently, the order of the orbit is equal to the index of the stabilizer: 
|O;| = [G : Gs]. 


There is one such equation for every s © S. As a consequence, the order of an orbit 
divides the order of the group. 

A more elementary formula uses the partition of § into orbits to count its ele- 
ments. We label the different orbits which make up S in some way, say as O), . . 
Ox. Then 


(7.3) |S| = |Oneei@n| eee 9) 7). 


These simple formulas have a great number of applications. 


em 
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(7.4) Example. Consider the group G of orientation-preserving symmetries of a 
regular dodecahedron D. It follows from the discussion of Section 8 of Chapter 4 
that these symmetries are all rotations. It is tricky to count them without error. Con- 
sider the action of G on the set S of the faces of D. The stabilizer of a face s is the 
group of rotations by multiples of 27/5 about a perpendicular through the center of 
s. So the order of Gs is 5. There are 12 faces, and G acts transitively on them. Thus 
|G| = 5 - 12 = 60. Or, G operates transitively on the vertices v of D. There are 
three rotations, including |. which fix a vertex, so |G. | = 3. There are 20 vertices; 
hence |G| = 3 - 20 = 60, which checks. There is a similar computation for edges. 
If e is an edge, then |G,.| = 2, so since 60 = 2 - 30, the dodecahedron has 30 
edges. 


Following our general principle, we should study restriction of an operation of 
a group G to a subgroup. Suppose that G operates on a set S, and let H be a subgroup 
of G. We may restrict the operation, to get an operation of H on S. Doing so leads to 
more numerical relations. 

Clearly, the H-orbit of an element s will be contained in its G-orbit. So we 
may take a single G-orbit and decompose it into H-orbits. We count the orders of 
these H-orbits, obtaining another formula. For example, let S be the set of 12 faces 
of the dodecahedron, and let H be the stabilizer of a particular face s. Then H also 
fixes the face opposite to s, and so there are two H-orbits of order 1. The remaining 
faces make up two orbits of order 5. In this case, (7.3) reads as follows. 


Peo 1 ot 5. 


Or let S be the set of faces, and let K be the stabilizer of a vertex. Then K does not 
fix any face, so every K-orbit has order 3: 

ORES i ie es rer 
These relations give us a way of relating several subgroups of a group. 


We close the section with a simple application of this procedure to the case that 
the G-set is the coset space of a subgroup: 


(7.5) Proposition. Let H and K be subgroups of a group G. Then the index of 
H (1 K in FH is at most equal to the index of K in G: 


OH Ty Ky = {G6 ox): 
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Proof. To minimize confusion, let us denote the coset space G/K by S, and the 
coset 1K by s. Thus |S| = [G: K]. As we have already remarked, the stabilizer of 
s is the subgroup K. We now restrict the action of G to the subgroup H and decom- 
pose S into H-orbits. The stabilizer of s for this restricted operation is obviously 
H 1 K. We don’t know much about the H-orbit O of s except that it is a subset of S. 
We now apply Proposition (7.2), which tells us that|O| = [H: H M K]. Therefore 
[H:H 1 K]=|O| =|S| =(G: K], as required. o 


8. PERMUTATION REPRESENTATIONS 


By its definition, the symmetric group S, operates on the set S = {I,..., n}. A per- 
mutation representation of a group G is a homomorphism 


Given any such representation, we obtain an operation of G on S = {1,....n} by let- 
ting me (5.2) be the permutation ¢(g). In fact, operations of a group G on {1,...,n} 
correspond in a bijective way to permutation representations. 

More generally, let S be any set, and denote by Perm(S) the group of its per- 
mutations. Let G be a group. 


(8.2) Proposition. There is a bijective correspondence 


operations homomorphisms 
of GonS G—- Perm (S) 


defined in this way: Given an operation, we define ¢: G——> Perm(S) by the rule 
y(g) = mz, where m, is multiplication by g (5.2). 


Let us show that ¢ is a homomorphism, leaving the rest of the proof of (8.2) as 
an exercise. We’ve already noted in Section 5 that mg is a permutation. So as defined 
above, ¢(g) © Perm(S). The axiom for a homomorphism is g(xy) = —(x)p(y), or 
Myy = mymy, where multiplication is composition of permutations. So we have to 
show that my(s) = mmy(s)) for every s © S. By Definition (5.2), mu(s) = (xy)s 
and m,(m,(s)) = x(ys). The associative law (5.1b) for group operations shows that 
(xy)s = x(ys), as required. o 


The isomorphism D;—— S; obtained in Section 6 by the action of D; on the 
cosets of H (6.2) is a particular example of a permutation representation. But a ho- 
momorphism need not be injective or surjective. If ¢: G——> Perm(S) happens to be 
injective, we say that the corresponding operation is faithful. So to be faithful, the 
operation must have the property that m, # identity, unless g = 1, or 


if gs = s for every s E S, then g = 1. 


The operation of the group of motions M on the set S of equilateral triangles in the 
plane is faithful, because the identity is the only motion which fixes all triangles. 


Section 8 Permutation Representations 183 


The rest of this section contains a few applications of permutation representa- 
tions. 


(8.3) Proposition. The group GL;(F2) of invertible matrices with mod 2 
coefficients is isomorphic to the symmetric group $3. 


Proof. Let us denote the field F, by F, and the group GL2(F:) by G. We 
have listed the six elements of G before [Chapter 3 (2.10)]. Let V = F? be 
the space of column vectors. This space consists of the following four vectors: 
V = {0,e1.e2,e; + e2}. The group G operates on V and fixes 0, so it operates on the 
set of three nonzero vectors, which form one orbit. This gives us a permutation rep- 
resentation ~@: G——>S,. Now the image of e, under multiplication by a matrix 
P € Gis the first column of P, and similarly the image of e2 is the second column 
of P. Therefore P can not operate trivially on these two elements unless it is the 
identity. This shows that the operation of G is faithful, and hence that the map ¢ is 
injective. Since both groups have order 6, ¢ is an isomorphism. o 


(8.4) Proposition. Let c, denote conjugation by g, the map c,(x) = gxg '. The 
map f: S;——> Aut(S;) from the symmetric group to its group of automorphisms 
which is defined by the rule g~~~ c, is bijective.. 


Proof. Let A denote the group of automorphisms of $3. We know from Chapter 
2 (3.4) that cg is an automorphism. Also, Cgr-= cgcn because Cer(x) = 
(gh)x(gh) | = ghxh''g! = c,(cr(x)) for all x. This shows that f is a homomor- 
phism. Now conjugation by g is the identity if and only if g is in the center of the 
group. The center of S; is trivial, so f is injective. 

It is to prove surjectivity of f that we look at a permutation representation of 
A. The group A operates on the set 53 in the obvious way; namely, if @ is an auto- 
morphism and s € S;, then as = a(s). Elements of S;3 of different orders will be in 
distinct orbits for this operation. So A operates on the subset of S$; of elements of or- 
der 2. This set contains the three elements {y, xy, x*v}. If an automorphism a fixes 
both xy and y, then it also fixes their product xyy = x. Since x and y generate S3, 
the only such automorphism is the identity. This shows that the operation of A on 
{y,xy,x*y} is faithful and that the associated permutation representation 
A—— Perm{y, xy, x*y} is injective. So the order of A is at most 6. Since f is injec- 
tive and the order of 5S; is 6, it follows that f is bijective. 5 


(8.5) Proposition. The group of automorphisms of the cyclic group of order p is 
isomorphic to the multiplicative group F,* of nonzero elements of Fp. 


Proof. The method here is to use the additive group F,* as the model for a 
cyclic group of order p. It is generated by the element 1. Let us denote the multi- 
plicative group F,* by G. Then G operates on F,* by left multiplication, and this 
operation defines an injective homomorphism y: G——> Perm(F,) to the group of 
permutations of the set F, of p elements. 
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Next, the group A = Aut(F,*) of automorphisms is a subgroup of Perm (F,,*). 
The distributive law shows that multiplication by an element a € J,” iy an automor- 
phism of F,*. It is bijective, and a(v + ¥) = at + av. Theretore the image of 
gy: G—> Perm(F,,") is contained in the subgroup A. Finally. an automorphism of 
F,* is determined by where it sends the generator |, and the image ot | can not be 
zero. Using the operations of G, we can send | to any nonzero element. Therefore ¢ 
is a surjection from G onto A. Being both injective and surjective, y is an isomor- 
phism. 1 


9. FINITE SUBGROUPS OF THE ROTATION GROUP 


In this section, we will apply the Counting Formula to classify finite subgroups of 
the rotation group SO;, which was defined in Chapter 4 (5.4). As happens with finite 
groups of motions of the plane, there are rather few finite subgroups of SO;, and all 
of them are symmetry groups of familiar figures. 


(9.1) Theorem. Every finite subgroup G of SO; is one of the following: 


C,: the cyclic group of rotations by multiples of 27/k about a line; 
D,: the dihedral group (3.4) of symmetries of a regular k—-gon; 
T: the tetrahedral group of twelve rotations carrying a regular tetrahedron to 


itself; 

O: the octahedral group of order 24 of rotations of a cube, or of a regular 
octahedron; 

I: the icosahedral group otf 60 rotations of a regular dodecahedron or a regular 
icosahedron: 


SAYSR 


We will not attempi to classify the infinite subgroups. 


Proof. Let G be a finite subgroup of SO;, and denote its order by v. Every ele- 
ment g of G except the identity is a rotation about a line €, and this line is obviously 
unique. So g fixes exactly two points of the unit sphere S in R*, namely the two 
points of intersection € M S. We call these points the poles of g. Thus a pole is a 
point p on the unit sphere such that gp = p for some element g # 1 of G. For ex- 
ample, if G is the group of rotational symmetries of a tetrahedron A, then. the poles 
will be the points of S lying over the vertices, the centers of faces, and the centers of 
edges of A. 


Section 9 Finite Subgroups of the Rotation Group 185 


~ 


Let P denote the set of all poles. 


(9.2) Lemma _ The set P is carried to itself by the action of G on the sphere. So G 
operates on P. 


Proof. Let p be a pole, say the pole of g © G. Let x be an arbitrary ele- 
ment of G. We have to show that xp is a pole, meaning that xp is left fixed by 
some element g' of G other than the identity. The required element is xgx™': 


xex '(xp) = xgp = xp, and xgx'' # 1 because g # l.o 


We are now going to get information about the group by counting the poles. 
Since every element of G except | has two poles, our first guess might be that there 
are 2N — 2 poles altogether. This isn’t quite correct, because the same point p may 
be a pole for more than one group element. 

The stabilizer of a pole p is the group of all of the rotations about the line 
€ = (0, p) which are in G. This group is cyclic and is generated by the rotation of 
smallest angle @ in G. [See the proof of Theorem (3.4a).] If the order of the stabi- 
lizer is rp, then @ = 27/rp. 

We know that r, > 1| because, since p is a pole, the stabilizer Gp contains an 
element besides |. By the Counting Formula (7.2), 


|Gp| |Op| = |G]. 
We write this equation as 
(3) | IpNp = N, 


where np is the number of poles in the orbit O, of p. 

The set of elements of G with a given pole p is the stabilizer G,, minus the 
identity element. So there are (r, — 1) group elements with p as pole. On the other 
hand, every group element g except | has two poles. Having to subtract | every- 
where is a little confusing here, but the correct relation is 


(9.4) Dd (rp — 1) = 2n - 2. 
peEP 


Now if p and p’ are in the same orbit, then the stabilizers Gp and G,: have the 
same order. This is because O, = O, and |G| = |G,||O,| = |G,’| |O,'|. There- 
fore we can collect together the terms on the left side of (9.4) which correspond 
to poles in a given orbit O,. There are np such terms, so the number of poles col- 
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lected together is np(rp — 1). Let us number the orbits in some way, as O,, O2,.... 
Then 


> ndri — 1) = 2 - 2, 


where n; = |O;|, and r; = |G,| for any p © Oj. Since N = niri, we can divide both 
sides by N and switch sides, to get the famous formula 


(9.5) 2-2-3 (1-4) 


This formula may not look very promising at first glance, but actually it tells us a 
great deal. The left side is less than 2, while each term on the right is at least 5. It 
follows that there can be at most three orbits! 

The rest of the classification is made by listing the various possibilities: 


One orbit: 2 — 2 = 1 ~ +. This is impossible, because 2 - “ > 1, while 
i ee 
r 
Two orbits: 2 — 2 = (1 = 1) (1 - 1) that is, 2= 14 4 
N r r2 N ro 


We know that rj =, because r; divides 4. This equation can hold only if 
r, = r2 = N. Thus nm, = n2 = 1. There are two poles p, p’, both fixed by every ele- 
ment of the group. Obviously, G is the cyclic group Cy of rotations about the line ¢ 
through p and p’. 


Three orbits: This is the main case: Formula (9.5) reduces to 


N nr; T2 r3 


We arrange the r; in increasing order. Then r,; = 2. For if all r; were at least 3, then 
the right side would be =0, which is impossible. 


Case 1: At least two of the orders r; are 2: r, = r2 = 2. The third order r; = r can 
be arbitrary, and N = 2r. Then n3 = 2: There is one pair of poles {p,p’} making 
the orbit O;. Every element g either fixes p and p’ or interchanges them. So the ele- 
ments of G are rotations about € = (p, p’), or else they are rotations by 7 about a 
line ¢’ perpendicular to €. It is easily seen that G is the group of rotations fixing a 
regular r-gon A, the dihedral group D,. The polygon A lies in the plane perpendicu- 
lar to €, and the vertices and the centers of faces of A corresponding to the remain- 
ing poles. The bilateral (reflection) symmetries of the polygon in R? have become 
rotations through the angle 7 when A is put into R’. 
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Case 2: Only one 7; is 2: The triples r, = 2, r: = 4, rx = 4 are impossible, because 
1/2 + 1/4 + 1/4 — 1 = 0. Similarly, r; = 2, r. = 3, rs = 6 can not occur be- 
cause 1/2 + 1/3 + 1/6 — 1 = 0. There remain only three possibilities: 


(9.6) 
Or = (2595 say =. 12; 
(ii) vr; = (2,3,4), Nv = 24; 


(iii) 7; = (2,3,5), N = 60. 


It remains to analyze these three cases. We will indicate the configurations 


briefly. 
(9.7) 


(i) ni = (6,4, 4). The poles in the orbit O2 are the vertices of a regular tetrahe- 

dron A, and G is the group of rotations fixing it: G = T. Here m is the number 
of edges of A, and n2,n3 are the numbers of vertices and faces of A. 

(ii) n; = (12,8, 6). The poles in O2 are the vertices of a cube, and the poles in O; 
are the vertices of a regular octahedron. G = O is the group of their rotations. 
The integers n; are the numbers of edges, vertices, and faces of a cube. 

(iii) nj = (30,20, 12). The poles of O2 are the vertices of a regular dodecahedron, 
and those in O; are the vertices of a regular icosahedron: G = J. 


There is still some work to be done to prove the assertions of (9.7). Intu- 
itively, the poles in an orbit should be the vertices of a regular polyhedron because 
they form a single orbit and are therefore evenly spaced on the sphere. However this 
is not quite accurate, because the centers of the edges of a cube, for example, form a 
single orbit but do not span a regular polyhedron. (The figure they span is called a 
truncated polyhedron. ) 

As an example, consider (9.7iii). Let p be one of the 12 poles in O;, and let g 
be one of the poles of O2 nearest to p. Since the stabilizer of p is of order 5 and op- 
erates on O2 (because G does), the images of g provide a set of five nearest neighbors 
to p, the poles obtained from q by the five rotations about p in G. Therefore the 
number of poles of O2 nearest to p is a multiple of 5, and it is easily seen that 5 is the 
only possibility. So these five poles are the vertices of a regular pentagon. The 12 
pentagons so defined form a regular dodecahedron. o 

We close this chapter by remarking that our discussion of the motions of the 
plane has analogues for the group M; of rigid motions of 3-space. In particular, one 
can define the notion of crystallographic group, which is a discrete subgroup whose 
translation group is a three-dimensional lattice L. To say that L is a lattice means 
that there are three linearly independent vectors a,b,c in R°* such that 
ta, tb,tc, € G. The crystallographic groups are analogous to lattice groups in 
M = M), and crystals form examples of three-dimensional configurations having 


188 Symmetry Chapter 5 


such groups as symmetry. We imagine the crystal to be infinitely large. Then the fact 
that the molecules are arranged regularly implies that they form an array having 
three independent translational symmetries. It has been shown that there are 230 
types of crystallographic groups, analogous to the 17 lattice groups (4.15). This is 
too long a list to be very useful, and so crystals have been classified more crudely 
into seven crystal systems. For more about this, and for a discussion of the 32 crys- 
tallographic point groups, look in a book on crystallography. 


Un bon héritage vaut mieux que le plus joli probleme de géometrie, 
parce qu'il tient lieu de méthode générale, 
et sert a resoudre bien des problémes. 


Gottfried Wilhelm Leibnitz 


EXERCISES 


I. Symmetry of Plane Figures 


1. Prove that the set of symmetries of a figure F in the plane forms a group. 
2. List all symmetries of (a) a square and (b) a regular pentagon. 
3. List all symmetries of the following figures. 
(a) (1.4) (b) (1.5) (©) (1-6) (d) (1.7) 
4. Let G be a finite group of rotations of the plane about the origin. Prove that G is cyclic. 


2. The Group of Motions of the Plane 


. Compute the fixed point of tape algebraically. 

. Verify the rules (2.5) by explicit calculation, using the definitions (2.3). 

. Prove that O is not a normal subgroup of M. 

. Let m be an orientation-reversing motion. Prove that m7? is a translation. 

. Let SM denote the subset of orientation-preserving motions of the plane. Prove that SM 

is a normal subgroup of M, and determine its index in M. 

6. Prove that a linear operator on R? is a reflection if and only if its eigenvalues are 1 and 
~1, and its eigenvectors are orthogonal. 

7. Prove that a conjugate of a reflection or a glide reflection is a motion of the same type, 
and that if m is a glide reflection then the glide vectors of m and of its conjugates have 
the same length. 

8. Complete the proof that (2.13) is a homomorphism. 

Prove that the map M——> {1,r} defined by tape~~> 1, taper~~~r is a homomor- 

phism. 

10. Compute the effect of rotation of the axes through an angle 7 on the expressions t,p¢ and 

taper for a motion. 


nh WSN = 


Sol 
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1}. (a) Compute the eigenvalues and eigenvectors of the linear operator m = por. 

(b) Prove algebraically that m ts a reflection about a line through the origin, which sub- 
tends an angle of $0 with the x-axis. 

(c) Do the same thing as in (b) geometrically. 

12. Compute the glide vector of the glide taper in terms of a and @. 
13. (a) Let m be a glide reflection along a line ¢. Prove geometrically that a point x lies on ¢ 
if and only if x , m(x), m?(x) are colinear. 

(b) Conversely, prove that if m is an orientation-reversing motion and x is a point such 
that x, m(x), m7(x) are distinct points on a line €, then m is a glide reflection 
along €. 

14. Find an isomorphism from the group SM to the subgroup of GL2(C) of matrices of the 

form E at with |a| = 1. 

15. (a) Write the formulas for the motions (2.3) in terms of the complex variable 
ee ar 

(b) Show that every motion has the form m(z) = az + B or m(z) = az + B, where 
ja} = 1 and B is an arbitrary complex number. . 


Finite Groups of Motions 


. Let D, denote the dihedral group (3.6). Express the product x?yx"'y~'x?y? in the form 


x'y/ in D,. 


. List all subgroups of the group D,, and determine which are normal. 
. Find all proper normal subgroups and identify the quotient groups of the groups D,,; and 


Ds. 


. (a) Compute the cosets of the subgroup H = {1,x°} in the dihedral group D1» explicitly. 


(b) Prove that D\o/H is isomorphic to D,. 
(c) Is Dio isomorphic to Ds X H? 


. List the subgroups of G = Ds which do not contain N = {1, x*}. 
. Prove that every finite subgroup of M is a conjugate subgroup of one of the standard sub- 


groups listed in Corollary (3.5). 


Discrete Groups of Motions 


. Prove that a discrete group G consisting of rotations about the origin is cyclic and is gen- 


erated by pg where @ is the smallest angle of rotation in G. 


. Let G be a subgroup of M which contains rotations about two different points. Prove al- 


gebraically that G contains a translation. 


. Let (a, b) be a lattice basis of a lattice L in R’. Prove that every other lattice basis has the 


form (a',b') = (a,b)P, where P is a 2 X 2 integer matrix whose determinant is +1. 


. Determine the point group for each of the patterns depicted in Figure (4.16). 
. (a) Let B be a square of side length a, and let € > 0. Let S be a subset of B such that the 


distance between any two points of S is = e. Find an explicit upper bound for the 
number of elements in S. 
(b) Do the same thing for a box B in R”. 
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aN 


12. 


13: 


14. 


. Prove that the subgroup of R* generated by | and V2 is dense in R*. 

. Prove that every discrete subgroup of O is finite. 

. Let G be a discrete subgroup of M. Prove that there is a point po in the plane which is not 
fixed by any point of G except the identity. 

. Prove that the group of symmetries of the frieze pattern 


.. ECRRRCEEEEe. .. 


is isomorphic to the direct product C; < C. of a cyclic group of order 2 and an infinite 
cyclic group. 
. Let G be the group of symmetries of the frieze pattern... 9 7° OO Oe far... 
(a) Determine the point group G of G. 
(b) For each element g € G, and each element g € G which represents 2, describe the 
action of g geometrically. 
(c) Let H be the subgroup of translations in G. Determine [G:H]. 


. Let G be the group of symmetries of the pattern 


< 
SSS 
£eeeeeeecceeeeee 0 SS eee ee 
Sep Ppeb ye phe r mEEEO Ox SO 
£6666 eC 
ssss SS C—C— ES lc 
€eeeeceeeeeeeeee = S$ See eee 
SSSSoo SSS Cr 
£eeeeececeeeeece SES eee 
SSS SSS ere ere 61 ore 
SSSsSsSoessssoos”——_ OS 
5 EL EE EM Ls 


Determine the point group of G. 

Let G be the group of symmetries of an equilateral triangular lattice L. Find the index in 
G of the subgroup T 1 G. 

Let G be a discrete group in which every element is orientation-preserving. Prove that 
the point group G is a cyclic group of rotations and that there is a point p in the plane 
such that the set of group elements which fix p is isomorphic to G. 


With each of the patterns shown, find a pattern with the same type of symmetry in 
(4.15). 
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15. Let N denote the group of rigid motions of the line € = R'. Some elements of N are 
tgi—>xt+a, aER, s:x—>-x. 


(a) Show that {ta,t,s} are all of the elements of N, and describe their actions on ¢ 


geometrically. 

(b) Compute the products tats, Sta, 55. 

(c) Find all discrete subgroups of N which contain a translation. It will be convenient to 
choose your origin and unit-length with reference to the particular subgroup. Prove 
that your list is complete. 

*16. Let N’ be the group of motions of an infinite ribbon 


Re-Atx.y) | eye 
It can be viewed as a subgroup of the group M. The following elements are in N’: 
ta: (x, y) ——> (x + a,y) 
Stew? (= 519) 
r:Axay)——> (x,.-y) 
Pltsy) > (— a; oN) 


(a) Show that these elements generate N’, and describe the elements of N’ as products. 


(b) State and prove analogues of (2.5) for these motions. 

(c) A frieze pattern is any pattern on the ribbon which is periodic and not degenerate, in 
the sense that its group of symmetries is discrete. Since it is periodic, its group of 
symmetries will contain a translation. Some sample patterns are depicted in the text 
(1.3, 1.4, 1.6, 1.7). Classify the symmetry groups which arise, identifying those 
which differ only in the choice of origin and unit length on the ribbon. I suggest that 
you begin by trying to make patterns with different kinds of symmetry. Please make 
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a careful case analysis when proving your results. A suitable format would be as fol- 
lows: Let G be a discrete subgroup containing a translation. 


Case 1: Every element of G is a translation. Then ..., 


Case 2: G contains the rotation p but no orientation-reversing symmetry. Then ..., 
and so on. 


Let L be a lattice of R’, and let a, b be linearly independent vectors lying in L. Show that 

the subgroup L’ = {ma + nb | m,n € Z} of L generated by a, b has finite index, and 

that the index is the number of lattice points in the parallelogram whose vertices are 

0.a.b,a + b and which are not on the “far edges” [a,a + b] and [b,a + b]. (So. 9 is 

included, and so are points which lie on the edges (0, a], [0, b], except for the points a,b 

themselves. ) 

(a) find a subset F of the plane which ts not fixed by any motion m € M. 

(b) Let G be a discrete group of motions. Prove that the union S of all images of F by 
elements of G is a subset whose group of symmetries G' contains G. 

(c) Show by an example that G' may be larger than G. 


*(d) Prove that there exists a subset F such that G’ = G. 


Let G be a lattice group such that no element g # | fixes any point of the plane. Prove 

that G is generated by two translations, or else by one translation and one glide. 

Let G be a lattice group whose point group is D, = {1, r}. 

(a) Show that the glide lines and the lines of reflection of G are ail parallel. 

(b) Let L = Le. Show that L contains nonzero vectors a = (a;,0)', b = (0, b,)'. 

(c) Let a and b denote the smallest vectors of the type indicated in (b). Then either (a, b) 
or (a,c) is a lattice basis for L, where c = $(a + b). 

(d) Show that if coordinates in the plane are chosen so that the x-axis is a glide line, 
then G contains one of the elements g = r or g = tiar. In either case, show that 
G=L ULeg. : 

(e) There. are four possibilities described by the dichotomies (c) and (d). Show that there 
are only three different kinds of group. 

Prove that if the point group of a lattice group G is C., then L = Lg is an equilateral tri- 

angular lattice, and G is the group of all rotational symmetries of L about the origin. 

Prove that if the point group of a lattice group G is De, then L = Lg is an equilateral tri- 

angular lattice, and G is the group of all symmetries of L. 


Prove that symmetry groups of the figures in Figure (4.16) exhaust the possibilities. 


5. Abstract Symmetry: Group Operations 


1. 


2. 
3 


4. 


Determine the group of automorphisms of the following groups. 
(a) Ca. (b) Ce AOC; 
Prove that (5.4) is an equivalence relation. 


Let S be a set on which G operates. Prove that the relation s ~ s’ if s' = gs for some 
g © G is an equivalence relation. 


Let ¢: G—->G' be a homomorphism, and let S be a set on which G’ operates. Show 
how to define an operation of G on S, using the homomorphism ¢. 
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Let G = Dz, be the dihedral group of symmetries of the square. 

(a) What is the stabilizer of a vertex? an edge? 

(b) G acts on the set of two elements consisting of the diagonal lines. What is the stabi- 
lizer of a diagonal? 

In each of the figures in exercise 14 of Section 4, find the points which have nontrivial 

stabilizers, and identify the stabilizers. 

Let G be a discrete subgroup of M. 

(a) Prove that the stabilizer G, of a point p is finite. 

(b) Prove that the orbit O, of a point p is a discrete set, that is, that there is a number 
€ > 0 so that the distance between two distinct points of the orbit is at least e. 

(c) Let B.B' be two bounded regions in the plane. Prove that there are only finitely 
many elements g € G so that gB M B’ is nonempty. 


. Let G = GL,(R) operate on the set S = R” by left multiplication. 


(a) Describe the decomposition of S into orbits for this operation. 
(b) What is the stabilizer of e,;? 


Decompose the set C?*? of 2 x 2 complex matrices for the following operations of 
GLC): 
(a) Left multiplication 


*(b) Conjugation 


(a) Let S = R”*"” be the set of real m X n matrices, and let G = GL,,(R) X GL,(R). 
Prove that the rule (P,Q),A~*~» PAQ' defines an operation of G on S. 

(b) Describe the decomposition of S into G-orbits. 

(c) Assume that m < n. What is the stabilizer of the matrix [/|0]? 


(a) Describe the orbit and the stabilizer of the matrix 1 under conjugation in 

() 2 
GL,(R). 

(b) Interpreting the matrix in GL2(F;), find the order (the number of elements) of the 
orbit. ; 

(a) Define automorphism of a field. 

(b) Prove that the field Q of rational numbers has no automorphism except the identity. 

(c) Determine Aut F, when F = Q[V2]. 


6. The Operation on Cosets 


1. 


What is the stabilizer of the coset aH for the operation of G on G/H? 


2. Let G be a group, and let H be the cyclic subgroup generated by an element x of G. 


Show that if left multiplication by x fixes every coset of H in G, then H is a normal 

subgroup. 

(a) Exhibit the bijective map (6.4) explicitly, when G is the dihedral group D, and S is 
the set of vertices of a square. 

(b) Do the same for D,, and the vertices of a regular n—gon. 

(a) Describe the stabilizer H of the index 1 for the action of the symmetric group G = S, 
on {1,...,m} explicitly. 

(b) Describe the cosets of H in G explicitly for this action. 

(c) Describe the map (6.4) explicitly. 
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5. Describe all ways in which S; can operate on a set of four elements. 

6. Prove Proposition (6.5). 

7. A map S——S' of G-sets is called a homomorphism of G- sets if ¢(gs) = gg(s) for all 
s © Sand g € G. Let ¢ be such a homomorphism. Prove the following: 
(a) The stabilizer Gs) contains the stabilizer Gs. 
(b) The orbit of an element s € S maps onto the orbit of ¢(s). 


7. The Counting Formula 


1. Use the counting formula to determine the orders of the group of rotational symmetries 
of a cube and of the group of rotational symmetries of a tetrahedron. 

2. Let G be the group of rotational symmetries of a cube C. Two regular tetrahedra A, A’ 
can be inscribed in C, each using half of the vertices. What is the order of the stabilizer 
of A? 

3. Compute the order of the group of symmetries of a dodecahedron, when orientation- 
reversing symmetries such as reflections in planes, as well as rotations, are allowed. Do 
the same for the symmetries of a cube and of a tetrahedron. 

4. Let G be the group of rotational symmetries of a cube, let S., Sp, S¢ be the sets of ver- 
tices, edges, and faces of the cube, and let H,, He, Hy be the stabilizers of a vertex, an 
edge, and a face. Determine the formulas which represent the decomposition of each of 
the three sets into orbits for each of the subgroups. 

5. LetG DH 2D K be groups. Prove the formula [G : K] = [G: H][H: K] without the 
assumption that G is finite. 

6. (a) Prove that if H and K are subgroups of finite index of a group G, then the intersec- 

tion H ™ K is also of finite index. 
(b) Show by example that the index [H : H M K] need not divide [G : K]. 


8. Permutation Representations 


1. Determine all ways in which the tetrahedral group T (see (9.1)) can operate on a set of 
two elements. 

2. Let S be a set on which a group G operates, and let H = {g © G| gs = sforalls € S}. 
Prove that H is a normal subgroup of G. 

3. Let G be the dihedral group of symmetries of a square. Is the action of G on the vertices 
a faithful action? on the diagonals? 

4. Suppose that there are two orbits for the operation of a group G on a set S, and that they 
have orders m,n respectively. Use the operation to define a homomorphism from G to 
the product Sj, X S, of symmetric groups. 

5. A group G operates faithfully on a set S of five elements, and there are two orbits, one of 
order 3 and one of order 2. What are the possibilities for G? 

6. Complete the proof of Proposition (8.2). 

7. Let F = 3. There are four one-dimensional subspaces of the space of column vectors 
F’. Describe them. Left multiplication by an invertible matrix permutes these subspaces. 
Prove that this operation defines a homomorphism gy: GL,(F)——> S,. Determine the 
kernel and image of this homomorphism. 


Chapter 5 Exercises 195 


*8. 


For each of the following groups, find the smallest integer n such that the group has a 
faithful operation on a set with n elements. 
(a) the quaternion group H (b)D, (c) De 


9. Finite Subgroups of the Rotation Group 


iG 


*8. 


“0 


Describe the orbits of poles for the group of rotations of an octahedron and of an 
icosahedron. 


. Identify the group of symmetries of a baseball, taking the stitching into account and al- 


lowing orientation-reversing symmetries. 


. Let O be the group of rotations of a cube. Determine the stabilizer of a diagonal line 


connecting opposite vertices. 


. Let G = O be the group of rotations of a cube, and let H be the subgroup carrying one 


of the two inscribed tetrahedra to itself (see exercise 2, Section 7). Prove that H = T. 
Prove that the icosahedral group has a subgroup of order 10. 


Determine all subgroups of the following groups: 
(a) T (b)/ 


. Explain why the groups of symmetries of the cube and octahedron, and of the dodecahe- 


dron and icosahedron, are equal. 

(a) The 12 points (+1, +a,0), (0, +1, +a@)(+a,0, +1) form the vertices of a regular 
icosahedron if @ is suitably chosen. Verify this, and determine a. 

(b) Determine the matrix of the rotation through the angle 27/5 about the origin in R?. 

(c) Determine the matrix of the rotation of R’ through the angle 27/5 about the axis 
containing the point (1,a@, 0). 

Prove the crystallographic restriction for three-dimensional crystallographic groups: A 

rotational symmetry of a crystal has order 2, 3, 4, or 6. 


Miscellaneous Problems 


1. 


2. 


2: 
*4, 


Describe completely the following groups: 

(a) Aut D, (b) Aut H, where H is the quaternion group 

(a) Prove that the set Aut G of automorphisms of a group G forms a group. 

(b) Prove that the map g: G——> Aut G defined by g ~~» (conjugation by g) is a homo- 
morphism, and determine its kernel. 

(c) The automorphisms which are conjugation by a group element are called inner auto- 
morphisms. Prove that the set of inner automorphisms, the image of g, is a normal 
subgroup of Aut G. 

Determine the quotient group Aut H/Int H for the quaternion group H. 

Let G be a lattice group. A fundamental domain D for G is a bounded region in the 

plane, bounded by piecewise smooth curves. such that the sets gD, g © G cover the 

plane without overlapping except along the edges. We assume that D has finitely many 
connected components. 

(a) Find fundamental domains for the symmetry groups of the patterns illustrated in ex- 
ercise 14 of Section 4. 

(b) Show that any two fundamental domains D, D’ for G can be cut into finitely many 
congruent pieces of the form gD M D’' or D ™M gD’ (see exercise 7, Section 5). 
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(c) Conclude that D and D’ have the same area. (It may happen that the boundary 
curves intersect infinitely often, and this raises some questions about the definition of 
area. Disregard such points in your answer.) 

Let G be a lattice group, and let po be a point in the plane which is not fixed by any ele- 

ment of G. Let S = {gpo| g © G} be the orbit of po. The plane can be divided into 

polygons, each one containing a single point of S, as follows: The polygon A, containing 

p is the set of points g whose distance from p is the smallest distance to any point of S: 


Ay = {q € R?| dist(q, p) < dist(q, p’) for all p’ € S}. 


(a) Prove that Ap is a polygon. 

(b) Prove that A, is a fundamental domain for G. 

(c) Show that this method works for all discrete subgroups of M, except that the domain 
A, which is constructed need not be a bounded set. 

(d) Prove that A, is bounded if and only if the group is a lattice group. 

(a) Let G' C G be two lattice groups. Let D be a fundamental domain for G. Show that 
a fundamental domain D’ for G’ can be constructed out of finitely many translates 
gD of D. 

(b) Show that [G : G’] < ~ and that [G : G'] = area(D')/area(D). 

(c) Compute the index [G : Lg] for each of the patterns (4.16). 

Let G be a finite group operating on a finite set S. For each element g € G, let S, denote 

the subset of elements of S fixed by g: Sp = {s € S| gs = s}. 

(a) We may imagine a true—false table for the assertion that gs = s, say with rows in- 
dexed by elements of G and columns indexed by elements. Construct such a table for 
the action of the dihedral group D; on the vertices of a triangle 

(b) Prove the formula >) |G;| = > |S°|. 


sES gEG 
(c) Prove Burnside’s Formula: 


|G| + (number of orbits) = > |S8|. 
REG 


. There are 70 = (5) ways to color the edges of an octagon, making four black and four 


white. The group Dg operates on this set of 70, and the orbits represent equivalent color- 
ings. Use Burnside’s Formula to count the number of equivalence classes. 


. Let G-be a group of order n which operates nontrivially on a set of order r. Prove that if 


n > r!, then G has a proper normal subgroup. 
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The more to do or to prove, the easier the doing or the proof. 


James Joseph Sylvester 


1, THE OPERATIONS OF A GROUP ON ITSELF 


By an operation of a group G on itself, we mean that in the definition of the opera- 
tion, G plays the role both of the group and of the set on which it operates. Any 
group operates on itself in several ways, two of which we single out here. The first is 
left multiplication: 


(1.1) GxXG— >G 
g, xX wr per. 


This is obviously a transitive operation of G on G, that is, G forms a single orbit, 
and the stabilizer of any element is the identity subgroup {1}. So the action is faith- 
ful, and the homomorphism 


(1.2) G—— Perm (G) 
gow m, = left multiplication by g 
defined in Chapter 5, Section 8 is injective. 


3) Theorem. Cayley’s Theorem: Every finite group G is isomorphic to a sub- 
group of a permutation group. If G has order n, then it is isomorphic to a subgroup 
of the symmetric group S,. 


Proof. Since the operation by left multiplication is faithful, G is isomorphic to 
its image in Perm(G). If G has order n, then Perm(G) is isomorphic to Sy. o 
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Though Cayley’s Theorem is intrinsically interesting, it is not especially useful for 
computation because S,, having order n!, is too large in comparison with n. 

The second operation we will consider is more subtle. It is conjugation, the 
map G X G——>G, defined by 


(1.4) (g, x) exe, 


For obvious reasons, we will not use multiplicative notation for this operation. You 
should verify the axioms (5.1) in Chapter 5, introducing a temporary notation such 
as g¥*x to denote the conjugate gxg'. 

The stabilizer of an element x € G for the operation of conjugation has a spe- 
cial name. It is called the centralizer of x and is denoted by Z(x): 


(1.5) Z(x) ={g © G| gxg! = x} ={g EG| gx = xg}. 


The centralizer is the set of group elements which commute with x. Note that 
x € Z(x), because x commutes with itself. 

The orbit of x for the operation of conjugation is called the conjugacy class of 
x. It consists of all conjugate elements gxg™'. We often write the conjugacy class as 


(1.6) C, = {x' © G| x’ = gxg"' for some g € G}. 


By the Counting Formula (Chapter 5 (7.2)], |G| = |Cx| | Z(x) |. 
Since the conjugacy classes are orbits for a group operation, they partition G. 
This gives us what is called the Class Equation for a finite group [see Chapter 5(7.3)]: 


(1.7) IGj= > |c|. 


conjugacy 
classes C 


If we number the conjugacy classes, say as Ci, i = 1,...,k, then this formula reads 
[G| =|C,| + + |Cal. 


However there is some danger of confusion, because the subscript 7 in C; is an index, 
while the notation C, as used above stands for the conjugacy class containing the el- 
ement x of G. In particular, C; has two meanings. Perhaps it will be best to list the 
conjugacy class of the identity element | of G first. Then the two interpretations of 
C, will agree. 

Notice that the identity element is left fixed by all g © G. Thus C, consists of 
the element | alone. Note also that each term on the right side of (1.7), being the 
order of an orbit, divides the left side. This is a strong restriction on the combina- 
tions of integers which may occur in such an equation. 


(1.8) The numbers on the right side of the Class Equation divide the 
order of the group, and at least one of them is equal to |. 


For example, the conjugacy classes in the dihedral group D;, presented as in 
Chapter 5 (3.6), are the following three subsets: 


Weenbse eno. Ip 
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The two rotations x. .x° are conjugate, as are the three reflections. The Class Equa- 
tion for D; is 


(1.9) cc) eee eee 


Recall from Chapter 2 (4.10) that the center of a group G is the set Z of ele- 
ments which commute with all elements of the group: 


Z = {g € G| gx = xg for all x € G}. 


Now the conjugacy class of an element x consists of that element alone if and only if 
x = gxg'' for all g © G. This means that x is in the center. Thus the elements of 
the center are represented by | on the right side of the Class Equation. 

The next proposition follows directly from the definitions. 


(1.10) Proposition. An element x is in the center of a group G if and only if its 
centralizer Z(x) is the whole group. o 


One case in which thé Class Equation (1.7) can be used effectively is when the 
order of G is a positive power of a prime p. Such a group is called a p-group. Here 
are a few applications of the Class Equation to p-groups. 


(1.11) Proposition. The center of a p-group G has order > 1. 


Proof. The left side of (1.7) is a power of p, say p®. Also, every term on the 
right side is a power of p toc, because it divides p*®. We want to show that some 
group element x # | is in the center, which is the same as saying that more than one 
term on the right side of (1.7) is equal to |. Now the terms other than 1, being posi- 
tive powers of p, are divisible by p. Suppose that the class C; made the only contri- 
bution of | to the right side. Then the equation would read 


p’ = 1.+ >\(multiples of p), 


which is impossible unless e = 0. o 


The argument used in this proof can be turned around and abstracted to give 
the following important Fixed Point Theorem for actions of p-groups: 


(1.12) Proposition. Let G be a p-group, and let S be a finite set on which G oper- 
ates. Assume that the order of S is not divisible by p. Then there is a fixed point for 
the action of G on S, that is, an element s © S whose stabilizer is the whole group. a 


(1.13) Proposition. Every group of order p? is abelian. 


Proof. Let G be a group of order p*. We will show that for every x € G, the 
centralizer Z(x) is the whole group. Proposition (1.10) will then finish the proof. So 
let x € G. If.x is in the center Z, then Z(x) = G as claimed. If x oa Z, then Z(x) is 
strictly larger than Z, because it contains Z and also contains the element x. Now the 
orders of Z and Z(x) divide |G| = p*, and Proposition (1.11) tells us that |Z! is at 
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least p. The only possibility is that |Z(x)| = p*. Hence Z(x) = G, and x was in the 
center after all. o 


There are nonabelian groups of order p*. The dihedral group D,, for example, 
has order 8. 
Let us use (1.13) to classify groups of order p’. 


(1.14) Corollary. Every group of order p’ is of one of the following types: 
(i) acyclic group of order p’; 
(ii) a product of two cyclic groups of order p. 


Proof. Since the order of an element divides p*, there are two cases to 
consider: 


Case 1: G contains an element of order p’ and is therefore a cyclic group. 


Case 2: Every element x of G except the identity has order p. Let x, y be two ele- 
ments different from |, and let H,, H2 be the cyclic groups of order p generated by x 
and y respectively. We may choose y.so that it is not a power of x. Then since 
y € H,, H, MN Hp is smaller than H2, which has order p. SoH; M Hz = {1}. Also, 
the subgroups H; are normal because G is abelian. Since y & H,, the group H\A2 is 
strictly larger than H,, and its order divides p*. Thus H,H, = G. By Chapter 2 
(8.6),G ~ A,X Ao. 0 


The number of possibilities for groups of order p” increases rapidly with n. 
There are five isomorphism classes of groups of order 8, and 14 classes of groups of 
order 16. 


2. THE CLASS EQUATION OF THE ICOSAHEDRAL GROUP 


In this section we determine the conjugacy classes in the icosahedral group / of rota- 
tional symmetries of a dodecahedron, and use them to study this very interesting 
group. As we have seen, the order of the icosahedral group is 60. It contains rota- 
tions by multiples of 27/5 about the centers of the faces of the dodecahedron, by 
multiples of 277/3 about the vertices, and by 7 about the centers of the edges. Each 
of the 20 vertices has a stabilizer of order 3, and opposite vertices have the same 
stabilizer. Thus there are 10 subgroups of order 3—the stabilizers of the vertices. 
Each subgroup of order 3 contains two elements of order 2, and the intersection of 
any two of these subgroups consists of the identity element alone. So / contains 
10 xX 2 = 20 elements of order 3. Similarly, the faces have stabilizers of order 5, 
and there are six such stabilizers, giving us 6 X 4 = 24 elements of order 5. There 
are 15 stabilizers of edges, and these stabilizers have order 2. So there are 15 ele- 
ments of order 2. Finally, there is one element of order 1. Since 


(2.1) 60 = be 1S 42024, 


we have listed all elements of the group. 
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Equation (2.1) is obtained by partitioning the group according to the orders of 
the elements. It is closely related to the Class Equation, but we can see that (2.1) is 
not the Class Equation itself, because 24, which appears on the right side, does not 
divide 60. On the other hand, we do know that conjugate elements have the same 
order. So the Class Equation is obtained by subdividing this partition of G still fur- 
ther. Also, note that the subgroups of order 3 are all conjugate. This is a general 
property of group operations, because they are the stabilizers of the vertices, which 
form a single orbit [Chapter 5 (6.5)]. The same is true for the subgroups of order 5 
and for those of order 2. 

Clearly the 15 elements of order 2, being the nontrivial elements in conjugate 
subgroups of order 2, form one conjugacy class. What about the elements of order 
3? Let x denote a counterclockwise rotation by 27r/3 about a vertex v. Though x will 
be conjugate to rotation with the same angfe about any other vertex (Chapter 5 
(6.5)], it is not so clear whether or not x is conjugate to x”. Perhaps the first guess 
would be that x and x? are not conjugate. 

Let v' denote the vertex opposite to v, and let x’ be the counterclockwise rota- 
tion by 27/3 about v’. So x and x’ are conjugate elements of the group. Notice that 
the counterclockwise rotation x about v is the same motion as the clockwise rotation 
by 27/3 about the opposite vertex v'. Thus x* = x', and this shows that x and x? 
are conjugate after all. It follows that all the elements of order 3 are conjugate. Sim- 
ilarly. the 12 rotations by 27/5 and —27/5 are conjugate. They are not conjugate to 
the remaining 12 rotations by 47/5, -47/5 of order 5. (One reason, as we have al- 
ready remarked, is that the order of a conjugacy class divides the order of the group, 
and 24 does not divide 60.) Thus there are two conjugacy classes of elements of or- 
der 5, and the Class Equation is 


(2.2) OO | tale ed + 12 + 12. 


We will now use this Class Equation to prove the following theorem. 


(2.3) Theorem. The icosahedral group / has no proper normal subgroup. 


A group-G # {1} is called a simple group if it is not the trivial group and if it 
contains no proper normal subgroup (no normal subgroup other than {i} and G). 
Thus the theorem can be restated as follows: 


(2.4) The icosahedral group is a simple group. 


Cyclic groups of prime order contain no proper subgroup at all and are there- 
fore simple groups. All other groups, except for the trivial group, contain proper 
subgroups, though not necessarily normal ones. We should emphasize that this use 
of the word simple does not imply “uncomplicated.” Its meaning here is roughly “not 


compound.” 


Proof of Theorem (2.3). The proof of the following lemma is straightforward: 
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(2.5) Lemma. 


(a) If a normal subgroup N of a group G contains an element x, then it contains 
the conjugacy class C, of x in G. In other words, a normal subgroup is a union 
of conjugacy classes. 

(b) The order of a normal subgroup N of G is the sum of the orders of the con- 
jugacy classes which it contains. o 


We now apply this lemma. The order of a proper normal subgroup of the icosa- 
hedral group is a proper divisor of 60 and is also the sum of some of the terms on 
the right side of the Class Equation (2.2), including the term |. It happens that there 
is no such integer. This proves the theorem. o 


(2.6) Theorem. The icosahedral group is isomorphic to the alternating group As. 


Proof. To describe this isomorphism, we need to find a set S of five elements 
on which / operates. One such set consists of the five cubes which can be inscribed 
into a dodecahedron, one of which is illustrated below: 


(2.7) Figure. One of the cubes inscribed in a dodecahedron. 


The group / operates on this set of cubes S, and this operation defines a homomor- 
phism ¢: /—— S;, the associated permutation representation. The map ¢ is our iso- 
morphism from / to its image As. To show that it is an isomorphism, we will use the 
fact that J is a simple group, but we need very little information about the operation 
itself. 

Since the kernel of ¢ is a normal subgroup of / and since / is a simple group, 
ker ¢ is either {1} or J. To say ker g = J would mean that the operation of / on the 
set of five cubes was the trivial operation, which it is not. Therefore ker g = {1}, 
and ¢ is injective, defining an isomorphism of / onto its image in Ss. 

Let us denote the image in Ss by J too. We restrict the sign homomorphism 
Ss;——> {+1} to J, obtaining a homomorphism /——> {+1}. If this homomorphism 
were surjective, its kernel would be a normal subgroup of J of order 30 [Chapter 2 
(6.15)]. This is impossible because / is simple. Therefore the restriction is the trivial 
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homomorphism, which just means that / is contained in the kernel As of the sign ho- 
momorphism. Since both groups have order 60, 7 = As. a 


3. OPERATIONS ON SUBSETS 


Whenever a group G operates on a set S, there is also an operation on subsets. If 
U C Sis a subset, then 


Bil) gU = {gu|u € U} 


is another subset of S. The axioms for an operation are clearly verified. So G oper- 
ates on the set of subsets of S. We can consider the operation on subsets of a given 
order if we want to do so. Since multiplication by g is a permutation of S, the sub- 
sets U and gU have the same order. 

For example, let O be the octahedral group of 24 rotations of a cube, and let S 
be the set of vertices of the cube. Consider the operation of O on subsets of order 2 
of S, that is, on unordered pairs of vertices. There are 28 such pairs, and they form 
three orbits for the group: 

(i) {pairs of vertices on an edge}; 
(ii) {pairs which are opposite on a face of the cube}; 
(iii) {pairs which are opposite on the cube}. 
These orbits have orders 12, 12, and 4 respectively: 28 = 12 + 12 + 4. 

The stabilizer of a subset U is the set of group elements g such that gU = U. 
Thus the stabilizer of a pair of opposite vertices on a face contains two elements— 
the identity and the rotation by 7 about the face. This agrees with the counting for- 
mula: 24 = 2: 12. 

Note this important point once more: The equality gU = U does not mean that 
g leaves the elements in U fixed, but rather that g permutes the elements within U, 
that is, that gu © U whenever u € U. 


(3.2) Proposition. Let H be a group which operates on a set S, and let U be a sub- 
set of S. Then H stabilizes U if and only if U is a union of H-orbits. o 


This proposition just restates the fact that the H-orbit of an element u € U is the set 
of all elements hu. If H stabilizes U, then U contains the H-orbit of any of its 
elements. o 


Let’s consider the case that G operates by left multiplication on the subsets of 
G. Any subgroup H of G is a subset, and its orbit consists of the left cosets. This 
operation of G on cosets was defined in Chapter 5 (6.1). But any subset of G has an 
orbit. 
(3.3) Example. Let G = D; be the dihedral group of symmetries of an equilateral 
triangle, presented as usual: 


G = (ey 0 =i =2,0= j <1, x? = 1, y? = 1, yx = xy}. 
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This group contains 15 subsets of order 2, and we can decompose this set of 15 into 
orbits for left multiplication. There are three subgroups of order 2: 


(3.4) H, = {l,y}, Ho = {l,xy}, As = {1, xy}. 


Their cosets form three orbits of order 3. The other six subsets of order 2 form a sin- 
gle orbit: 15 = 3 + 3 + 3 + 6. The orbit of six is 


(3.5) {1x}, (x,x7}, (x7, Lhy, x} ot 


(3.6) Proposition. Let U be a subset of a group G. The order of the stabilizer 
Stab (U) of U for the operation of left multiplication divides the order of U. 


Proof. Let H denote the stabilizer of U. Proposition (3.2) tells us that U is a 
union of orbits for the operation of H on G. These H-orbits are right cosets H,. So U 
is a union of right cosets. Hence the order of U is a multiple of |H|. a 


Of course since the stabilizer is a subgroup of G, its order also divides |G|. So 
if |U| and |G| have no common factor, then Stab (U) is the trivial subgroup {1}. 

The operation by conjugation on subsets of G is also interesting. For example, 
we can partition the 15 subsets of D3 of order 2 into orbits for conjugation. The set 
{H,, H», Hx} of conjugate subgroups is one orbit, and the set {x, x”} forms an orbit 
by itself. The other orbits have orders 2, 3, and6: 15 =1+2+3+3+ 6. 

For our purposes, the important thing is the orbit under conjugation of a sub- 
group H C G. This orbit is the set of conjugate subgroups 


{gHg' |g & Gh. 
The subgroup H is normal if and only if its orbit consists of H alone, that is, 
gHg' = H for all g € G. 


The stabilizer of a subgroup H for the operation of conjugation is called the 
normalizer of H and is denoted by 


C7) NICHT) — eC 1G | eioe eae 
The Counting Formula reads 
(3.8) |G| = |N(H)| - | {conjugate subgroups} |. 


Hence the number of conjugate subgroups is equal to the index [G : N (H)]. 
Note that the normalizer always contains the subgroup 


(3.9) N(H) DH, 


because hHh"' = H when h € H. So by Lagrange’s Theorem, |H| divides 
|N(H)|, and | N(H)| divides |G]. 

In example (3.3), the subgroups H,,H2,H3; are all conjugate, and so 
| N (Hi)| = 2; hence N (Hi) = Hi. 

The definition of the normalizer N (H) shows that H is a normal subgroup of 
N(H), and in fact N (H) is the largest group containing H as a normal subgroup. In 
particular, N(H) = G if and only if H is a normal subgroup of G. 
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4. THE SYLOW THEOREMS 


The Sylow Theorems, which we will prove in this section, describe the subgroups of 
prime power order of an arbitrary finite group. 

Let G be a group of order n = |G], and let p be a prime number which divides 
n. We will use the following notation: p* will denote the largest power of p dividing 
n, so that 


(4.1) n= p‘m 


for some integer m, and p does not divide m. 


(4.2) Theorem. First Sylow Theorem: There is a subgroup of G whose order is p°. 


The proofs of the Sylow Theorems are at the end of the section. 


(4.3) Corollary. If a prime p divides the order of a finite group G, then G con- 
tains an element of order p. 


For, let H be a subgroup of order p*, and let x be an element of H different from 1. 
The order of x divides p‘, so it is p” for some r in the range 0 < r < e. Then x” 
has order p. 5 


Without the Sylow Theorem, this corollary is not obvious. We already know 
that the order of any element divides |G|, but we might imagine a group of order 6, 
for example, made up of the identity | and five elements of order 2. No such group 
exists. According to (4.3), a group of order 6 must contain an element of order 3 
and an element of order 2. 


(4.4) Corollary. There are exactly two isomorphism classes of groups of order 6. 
They are the classes of the cyclic group C. and of the dihedral group Ds. 


Proof. Let x be an element of order 3 and y an element of order 2 in G. It is 
easily seen that the six products x'y’,0 =i =2,0= j <1 are distinct elements 


r= 


of the group. For we can rewrite an equation x‘y/ = x’y* in the form x!~" = y*/, 
Every power of x except the identity has order 3, and every power of y except the 
identity has order 2. Thus x'~’ = y*/’ = 1, which shows that r = i and s = j. 
Since G has order 6, the six elements 1, x, x’, y,xy,x’y run through the whole 
group. In particular, yx must be one of them. It is not possible that yx = y because 


this would imply x = 1. Similarly, yx # 1, x, x*. Therefore one of the two relations 
yx = xy or yx =x’y 


holds in G. Either of these relations, together with x° = 1 and y* = 1, allows us to 
determine the multiplication table for the group. Therefore there are at most two iso- 
morphism classes of groups of order 6. We know two already, namely the classes of 
the cyclic group C. and of the dihedral group D3. So they are the only ones. o 
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(4.5) Definition. [et G be a group of order n = p“m, where p is a-prime not di- 
viding m and e = 1. The subgroups H of G of order p* are called Sylow p- 
subgroups of G, or often just Sylow subgroups. 


Thus a Syiew p-subgroup is a p-subgroup whose index in the group is not di- 
visible by p. By Theorem (4.2), a finite group G always has a Sylow p-subgroup if p 
divides the order of G. The remaining Sylow Theorems (4.6) and (4.8) give more 
information about them. 


(4.6) Theorem. Second Sylow Theorem: Let K be a subgroup of G whose order is 
divisible by p, and let H be a Sylow p-subgroup of G. There is a conjugate subgroup 
H' = gHe'' such that K M H' is a Sylow subgroup of K. 


(4.7) Corollary. 


(a) If K is any subgroup of G which is a p-group, then K is contained in a Sylow 
p-subgroup of G. 
(b) The Sylow p-subgroups of G are all conjugate. 


It is clear that a conjugate of a Sylow subgroup is also a Sylow subgroup. So to ob- 
tain the first part of the corollary, we only need to note that the Sylow subgroup of a 
p-group K is the group K itself. So if H is a Sylow subgroup and K is a p-group, 
there is a conjugate H’ such that K ™ H’' = K, which is to say that H' contains K. 
For part (b), let K and H be Sylow subgroups. Then there is a conjugate H' of H 
which contains K. Since their orders are equal, K = H'. Thus K and H are conju- 
gate. o 


(4.8) Theorem. Third Sylow Theorem: Let |G| =n, and n = p*m as in (4.1). 
Let s be the number of Sylow p-subgroups. Then s divides m and is congruent | 
(modulo p): sjm, and s = ap + 1 for some integer a = 0. 


Before proving these theorems, we will use them to determine the groups of 
orders 15 and 21. These examples show how powerful the Sylow Theorems are, but 
do not be misled. The classification of groups of order n is not easy when n has many 
factors. There are just too many possibilities. 


(4.9) Proposition. 


(a) Every group of order 15 is cyclic. 


(b) There are two isomorphism classes of groups of order 21:.the class of the 
cyclic group C2, and the class of the group G having two generators x, y which 
satisfy the relations x’ = 1, y* = 1, yx = x’y, 


Proof. 


(a) Let G be a group of order 15. By (4.8) the number of its Sylow 3-subgroups di- 
vides 5 and is congruent | (modulo 3). The only such integer is 1. Therefore there is 
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one Sylow 3-subgroup H, and so it is a normal subgroup. There is one Sylow 5-sub- 
group K, and it is normal too, for similar reasons. Clearly, K © H = {1}, because 
the order of K M H divides both 5 and 3. Also, KH is a subgroup of order >5, and 
hence KH = G. By (8.6) in Chapter 2, G is isomorphic to the product group H X K. 
Thus every group of order 15 is isomorphic to a direct product of cyclic groups of 
orders 3 and 5. All groups of order 15 are isomorphic. Since the cyclic group Cis is 
one of them, every group of order 15 is cyclic. 


(b) Let G be a group of order 21. Then Theorem (4.8) shows that the Sylow 7-sub- 
group K must be normal. But the possibility that there are seven conjugate Sylow 
3-subgroups H is not ruled out by the theorem, and in fact this case does arise. Let x 
denote a generator for K, and y a generator for one of the Sylow 3-subgroups H. 
Then x’ = 1, y? = 1, and, since K is normal, yxy"! = x! for some i < 7. 
We can restrict the possible exponents j by using the relation y* = 1. It implies 
that 
x= yxy? = yexiy? = yx? y! = xP 


Hence i? = 1 (mod 7). This means that i can take the values 1, 2, 4. 


Case 1: yxy' = x. The group is abelian, and by (8.6) in Chapter 2 it is isomorphic 
to a direct product of cyclic groups of orders 3 and 7. Such a group is cyclic [Chap- 
ter 2 (8.4)]. 

Case 2: yxy ' = x*®. The multiplication in G can be carried out using the rules 
x’ = 1, y} = 1, yx = x’y, to reduce every product of the elements x,y to one of 
the forms x‘v/ with 0 < i < 7 and 0 < j < 3. We leave the proof that this group 
actually exists as an exercise. 

Case 3: yxy''| = x*. In this case, we replace y by y’, which is also a generator for 
H, to reduce to the previous case: y?xy ? = yx*y"! = x'© = x*. Thus there are two 
isomorphism classes of groups of order 21, as claimed. o 


We will now prove the Sylow Theorems. 


Proof of the First Sylow Theorem. We let Ff be the set of all subsets of G of 
order p°. One of these subsets is the subgroup we are looking for, but instead of 
finding it directly we will show that one of these subsets has a stabilizer of order p°. 
The stabilizer will be the required subgroup. 


(4.10) Lemma. The number of subsets of order p* in a set of n = pm elements 
( p not dividing m) is the binomial coefficient 


= (”.) = n(n—1)--(n— ks (n — p* + 1) 
p’) p(p? — l)(p? — ky 1 


Moreover A is not divisible by p. 
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Proof. It is a standard fact that the number of subsets of order p* is this bino- 
mial coefficient. To see that is not divisible by p, note that every time p divides a 
term (n — k) in the numerator of n, it also divides the term (p* — k) of the denomi- 
nator exactly the same number of times: If we write k in the form k = p‘l, where p 
does not divide /, then i < e. Therefore (n — k) and (p* — k) are both divisible by 
p' but not divisible by p'*!. o 


We decompose & into orbits for the operation of left multiplication, obtaining 
the formula 
Nmcajeh| =» >, 0) 
orbits O 
Since p does not divide ’, some orbit has an order which is not divisible by p, say 


the orbit of the subset U. We now apply Proposition (3.6) to conclude that | Stab (U) | 
is a power of p. Since 


(4.11) |Stab(U)| -|Ou| = |G| = p*m 


by the Counting Formula, and since |Oy| is not divisible by p, it follows that 
|Stab(U)| = p*. This stabilizer is the required subgroup. o 


Proof of the Second Sylow Theorem. We are given a subgroup K and a Sylow 
subgroup H of G, and we are to show that for some conjugate subgroup H' of H, the 
intersection K  H’ is a Sylow subgroup of K. 

Let S denote the set of left cosets G/H. The facts that we need about this set 
are that G operates transitively, that is, the set forms a single orbit, and that H is the 
stabilizer of one of its elements, namely of s = 1H. So the stabilizer of as is the 
conjugate subgroup aHa™' [see Chapter 5(6.5b)]. 

We restrict the operation of G to K and decompose S into K-orbits. Since 1s 
a Sylow subgroup, the order of S is prime to p. So there is some K-orbit O whose 
order is prime to p. Say that O is the K-orbit of the element as. Let H’ denote the 
stabilizer aHa™' of as for the operation of G. Then the stabilizer of as for the re- 
stricted operation of K is obviously H' M K, and the index [K:H' 1M K] is |O|, 
which is prime to p. Also, since it is a conjugate of H, H' is a p-group. Therefore 
H' (1) K is a p-group. It follows that H’ M K is a Sylow subgroup of K. 5 


Proof of the Third Sylow Theorem. By Corollary (4.7), the Sylow subgroups of 
G are all conjugate to a given one, say to H. So the number of Sylow subgroups is 
s =[G:N], where N is the normalizer of H. Since H C N, [G:N] divides 
[G:H] = m. To show s = | (modulo p), we decompose the set {H;,..., Hs} of Sy- 
low subgroups into orbits for the operation of conjugation by H = H,. An orbit con- 
sists of a single group H; if and only if H is contained in the normalizer N; of H;. If 
so, then H and H; are both Sylow subgroups of N,, and H; is normal in N;. Corollary 
(4.7b) shows that H = H;. Therefore there is only one H-orbit of order 1, namely 
{H}. The other orbits have orders divisible by p because their orders divide |H|, by 
the Counting Formula. This shows that s = 1 (modulo p). o 
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5. THE GROUPS OF ORDER 12 
In this section, we use the Sylow Theorems to classify the groups of order 12: 


(5.1) Theorem. There are five isomorphism classes of groups of order 12. They 
are represented by: 
(i) the product of cyclic groups C3 X C4; 
(ii) the product of cyclic groups C2 X C2 X C3; 
(iii) the alternating group Aa, 
(iv) the dihedral group Deg, 
(v) the group generated by x, y, with relations x* = 1, y? = 1, xy = y?’x. 


Note that C3 X C4 is isomorphic to C12 and that C2, Xx C2, X C3 is isomorphic to 
C2 X Cz (see [Chapter 2 (8.4)]). 


Proof. Let G be a group of order 12. Denote by H a Sylow 2-subgroup of G, 
which has order 4, and by K a Sylow 3-subgroup, of order 3. It follows from Theo- 
rem (4.8) that the number of Sylow 2-subgroups is either 1 or 3, and that the number 
of Sylow 3-subgroups is 1 or 4. Also, H is a group of order 4 and is therefore either 
a cyclic group or the Klein four group V, a product of two cyclic groups of order 2: 


(5.2) H=C, or H =V. 


(5.3) Lemma. At least one of the two subgroups H,K is normal. 


Proof. Suppose that K is not normal. Then K has four conjugate subgroups 
K = K,,..., Ks. Since | K;| = 3, the intersection of any two of these groups must be 
the identity. Counting elements shows that there are only three elements of G which 
-are not in any of the groups Kj. 


Any Sylow 2-subgroup H has order 4, and H M K; = {1}. Therefore it consists of 
these three elements and 1. This describes H for us and shows that there is only one 
Sylow 2-subgroup. Thus H is normal. o 
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Since H M K = {1}, every element of HK has a unique expression as a 
product hk [Chapter 2 (8.6)], and since |G| = 12, HK = G. If H is normal, then K 
operates on H by conjugation, and we will show that this operation, together with 
the structure of H and K, determines the structure of G. Similarly, if K is normal 
then H operates on K, and this operation determines G. 


Case 1: H and K are both normal. Then by (8.6) in Chapter 2, G is isomorphic to 
the product group H X K. By (5.2) there are two possibilities: 


(5.4) G=C,XC; o G=VxG 
These are the abelian groups of order 12. 


Case 2: H is normal but K is not. So there are four conjugate Sylow 3-subgroups 
{K,,..., Ka}, and G operates by conjugation on this set S of four subgroups. This op- 
eration determines a permutation representation 


(5.5) eS 


Let us show that g maps G isomorphically to the alternating group Ag in this case. 

The stabilizer of K; for the operation of conjugation is the normalizer N (Ki), 
which contains K;. The Counting Formula shows that | N(Ki)| = 3, and hence that 
N(K;) = K;. Since the only element common to the subgroups K; is the identity ele- 
ment, only the identity stabilizes all of these subgroups. Thus ¢ is injective and G is 
isomorphic to its image in S4. 

Since G has four subgroups of order 3, it contains eight elements of order 3, 
and these elements certainly generate the group. If x has order 3, then ~(x) is a per- 
mutation of order 3 in S,. The permutations of order 3 are even. Therefore 
img C Ag. Since |G| = | A,|, the two groups are equal. 

As a corollary, we note that if H 1s normal and K is not, then H is the Klein 
four group V, because the Sylow. 2-subgroup of A, is V. 


Case 3: K is normal, but H is not. In this case H operates on K by conjugation, and 
conjugation by an element of H is an automorphism of K. We let y be a generator for 
the cyclic group K: y’ = 1. There are only two automorphisms of K—the identity 
and the automorphism which interchanges y and y’. 

Suppose that H is cyclic of order 4, and let x generate H: x* = 1. Then since G 
is not abelian, xy # yx, and so conjugation by x is not the trivial automorphism of 
K. Hence xyx'' = y*. The Todd—Coxeter Algorithm (see Section 9) is one way to 
show that these relations define a group of order 12: 


(5.6) xt =), y? = 1) xyx! = y?. 


The last possibility is that H is isomorphic to the Klein four group. Since there 
are only two automorphisms of K, there is an element w € H besides the identity 
which operates trivially: wyw'' = y. Since G is not abelian, there is also an element 
v which operates nontrivially: vyv'' = y’. Then the elements of H are {1,v,w,vow}, 
and the relations v? = w* = 1, and vw = wv hold in H. The element x = wy has 
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order 6, and vxv'' = vwyv"! = wy? = y*w = x7!. The relations x° = 1, o? = 1, 
uxv ' = x ' define the group Ds, so G is dihedral in this case. o 


6. COMPUTATION IN THE SYMMETRIC GROUP 


We want to bring up two points about calculation with permutations. The first con- 
cerns the order of multiplication. To have a uniform convention, we have used the 
functional notation p(x) for all our maps p, including permutations. This has the 
consequence that a product pq must be interpreted as the composed operation p ° q, 
that is, “first apply q, then p.” When multiplying permutations, it is more usual to 
read pq as “first apply p, then g.” We will use this second convention here. A com- 
patible notation for the operation of a permutation p on an index i requires writing 
the permutation on the right side of the index: 


(i)p. 


Applying first p and then g to an index i, we get ((i)p)q = (i)pgq, as desired. Actu- 
ally, this notation looks funny to me. We will usually drop the parentheses: 

(i)p = ip. 
What is important is that p must appear on the right. 

To make our convention for multiplication compatible with matrix multiplica- 
tion, we must replace the matrix P associated to a permutation p in Chapter | (4.6) 
by its transpose P', and use it to multiply on the right on a row vector. 

The second point is that it is not convenient to compute with permutation ma- 
trices, because the matrices are jarge in relation to the amount of information they 
contain. A better notation is needed. One way to describe a permutation is by means 
of a table. We can consider the configuration 


1 2°3 4 8 6 7% 
ey) hc 5 oe A 
as a notation for the permutation defined by 

lp = 4, 2p = 6,.... 


It is easy to compute products using this notation. If for example 
pie 3.45 Ga ‘| 
pm 46.8 1 3-5 
then we can evaluate pg (first p, then q) by reading the two tables in succession: 
Pile 4 S07] | 
Ts sas YA 2 Sl 


Table (6.1) still requires a lot of writing, and of course the top row is always 
the same. It could, in principle, be left off, to reduce the amount of writing by half, 
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but this would make it hard to find our place in the bottom row if we were permut- 
ing, say, 18 digits. 

Another notation, called cycle notation, is commonly used. It describes a per- 
mutation of n elements by at most n symbols and is based on the partition of the in- 
dices into orbits for the operation of a permutation. Let p be a permutation, and let 
H be the cyclic subgroup generated by p. We decompose the set {1,...,m} into H- 
orbits and refer to these orbits as the p-orbits. The p-orbits form a partition of the 
set of indices, called the cycle decomposition associated to the permutation p. 

If an index i is in an orbit of k elements, the elements of the orbit will be 


O = {i, ip, ip’,...,ip* ‘}. 
Let us denote ip’ by i,, so that O = {io, i:,..., ix-1}. Then p operates on this orbit as 


(6.2) in i. 
Ne / 
A permutation which operates in this way on a subset {io, i,..., ix-1} of the indices 


and leaves the remaining indices fixed is called a cyclic permutation. Thus 


as 
(6.3) a=1 u| 
es 


defines a cyclic permutation of order 5 of {1,..., 8}, it being understood that the in- 
dices 2,5, 6 which are not mentioned are left fixed—each forms a o-orbit of one el- 
ement. When we speak of the indices on which a permutation operates, we will 
mean the ones which are not fixed: 1,3, 4, 7,8 in this case. 

Another cyclic permutation of {1,..., 8} is 


(6.4) = on 


Such a cyclic permutation of order 2 is called a transposition. A transposition is a 
permutation which operates on two indices. 


Our permutation p (6.1) is not cyclic because there are three p-orbits: 


a a : 
ued i 5 G | 
ee 


oe T 


P 


It is clear that 
p=otT=Te, 


where or denotes the product permutation. 
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(6.5) Proposition. Let 7,7 be permutations which operate on disjoint sets of in- 
dices. Then or = ta. 


Proof. If neither o nor 7 operates on an index i, then ior = ito = i. If o 
sends i to j # i, then 7 fixes both i and j. In that case, ior = jr = j and 
ira = io = j too. The case that 7 operates on i is the same. o 


Note, however, that when we multiply permutations which operate on overlap- 
ping sets of indices, the operations need not commute. The symmetric group S,, is 
not a commutative group if n > 2. For example, if 7’ is the transposition which in- 
terchanges 3 and 6 and if o@ is as above, then or’ # T'o. 


(6.6) Proposition. Every permutation p not the identity is a product of cyclic per- 
mutations which operate on disjoint sets of indices: p = a\02°-- ox, and these cyclic 
permutations o, are uniquely determined by p. 


Proof. We know that p operates as a cyclic permutation when restricted to a 
single orbit. For each p-orbit, we may define a cyclic permutation a, which permutes 
that orbit in the same way that p does and which fixes the other indices. Clearly, p is 
the product of these cyclic permutations. Conversely, let p be written as a product 
002 °*: ox of cyclic permutations operating on distinct sets O,,..., Ox of indices. Ac- 
cording to Proposition (6.5), the order does not matter. Note that o2,..., 0% fix the 
elements of O,; hence p and o; act in the same way on O;. Therefore O, is a p-orbit. 
The same is true for the other cyclic permutations. Thus O,,...,Ox are the p-orbits 
which contain more than one element, and the permutations o; are those constructed 
at the start of the proof. 5 


A cyclic notation for the cyclic permutation (6.2) is 
(6.7) (lott -+- ik—1). 


Thus our particular permutation o has the cycle notation (14387). The notation is 
not completely determined by the permutation, because we can start the list with 
many of the indices i,..., ix-,. There are five equivalent notations for o: 


o = (43871) = (38714) = --. 


Any one of these notations may be used. 

A cycle notation for an arbitrary permutation p is obtained by writing the per- 
mutation as a product of cyclic permutations which operate on disjoint indices, and 
then writing the cycle notations for each of these permutations in succession. The or- 
der is irrelevant. Thus two of the possible cycle notations for the permutation p de- 
scribed above are 


(14387)(26) and (62)(87143). 


If we wish, we can include the “one-cycle” (5), to represent the fixed element 5, 
thereby presenting all the indices in the list. But this is not customary. 


214 More Group Theory Chapter 6 


With this notation, every permutation can be denoted by a string of at most n 
integers, suitably bracketed. Products can still be described by juxtaposition. A cy- 
cle notation for the permutation g considered above is g = (124875)(36). Thus 


U ’ 


o T o T 
(6.8) pq = (14387)(26)(124875)(36) = ora'r’. 


This string of cycles represents the permutation pq. To evaluate the product on an in- 
dex, the index is followed through the four factors: 


[As deed nes Bes Se and so on. 


However, (6.8) does not exhibit the decomposition of pq into disjoint cycles, be- 
cause indices appear more than once. Computation of the permutation as above leads 


to the cycle decomposition 
8 


Sins eee ) i ") - 
pq = = =. Ny 6: 


When the computation is finished, every index occurs at most once. 
For another sample, let p = (548). Then 


op = (14387)(548) = (187)(354) 
po = (548)(14387) = (147)(385). 


Now let us compute the conjugate of a permutation p. Since p is a product of 
disjoint cycles, it will be enough to describe the conjugate g''oq of a cyclic permu- 
tation a, say the permutation (i; --- iz). (The fact that we have switched the order of 
multiplication makes the expression for conjugation by q_' a little nicer than that for 
conjugation by q.) 


(6.9) 


(6.10) Proposition. 


(a) Let o denote the cyclic permutation (iii: --- iz), and let g be any permutation. 
Denote the index i,q by j,. Then the conjugate permutation g~'aq is the cyclic 
permutation (jijo°+: jx). 

(b) If an arbitrary permutation p is written as a product of disjoint cycles o, then 
q ‘pq is the product of the disjoint cycles g™'aq. 

(c) Two permutations p, p’ are conjugate elements of the symmetric group if and 
only if their cycle decompositions have the same orders. 


Proof. The proof of (a) is the following computation: 
j-q''oq = i-og = irtig = jrti, 
in which the indices are to be read modulo k. Part (b) follows easily. Also, the fact 


that conjugate permutations have cycle decompositions with the same orders follows 
from (b). Conversely, suppose that p and p' have cycle decompositions of the same 
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orders. Say that p = (i ++i, )(i;' +++ ip’) ++ and p’ = (jie j( ir’ jy’) . Define 
q to be the permutation sending i,~~~+j,, i,’~~~j,', and so on. Then 
P’ = 4 'Pq.0 

Let us determine the Class Equation for the symmetric group S$; as an example. 
This group contains six transpositions 

(12), (13), (14), (23), (24), (34), 
three products of disjoint transpositions 
(12)(34), (13)(24), (14)(23), 


eight 3-cycles, and six 4-cycles. By Proposition (6.10), each of these sets forms one 
conjugacy class. So the Class Equation of S, is 


24 °= 1 +3 +6 Or 8. 


We will now describe the subgroups G of the symmetric group S, whose order 
is divisible by p and whose Sylow p-subgroup is normal. We assume that p is a 
prime integer. Since p divides p! = |S,| only once, it also divides |G| once, and so 
the Sylow p-subgroup of G is a cyclic group. 

It turns out that such subgroups have a very nice description in terms of the 
finite field F,. To obtain it, we use the elements {0, 1, ---, p—1} of the finite field as 
the indices. Certain permutations of this set are given by the field operations them- 
selves. Namely, we have the operations (add a) and (multiply by c) for any given 
a,c € Fp, c # 0. They are invertible operations and hence permutations of F,, so 
they represent elements of the symmetric group. For example, (add 1) is the p-cycle ° 


(6.11) (add 1) = (012-*:(p-1)). 


The operator (multiply by c) always fixes the index 0, but its cycle decomposition de- 
pends on the order of the element c in F,*. For example, 


(6.12) (multiply by 2) = (1243) ifp = 5 
= (124)(365) ifp=7. 
Combining the operations of addition and multiplication gives us all operators on F, 
of the form 
(6.13) xr cx + a. 


The set of these operators forms a subgroup G of order p(p~1) of the symmetric 


group. 
The group of operators (6.13) has a nice matrix representation, as the set of 


2 X 2 matrices with entries in the field F,, of the form 


(6.14) ' A! 
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This matrix operates by right multiplication on the vector (1,x), sending it to 
(1,cx + a). So we can recover the operation of G on F, from right multiplication by 
the corresponding matrix. (We use right multiplication because of our chosen order 
of operations.) The operations (add a) and (multiply by c) are represented by the ele- 


mentary matrices 
l ; ( ; 


(6.15) Theorem. Let p be a prime, and let H be a subgroup of the symmetric 
group Sp whose order is divisible by p. If the Sylow p-subgroup of H is normal, 
then, with suitable labeling of the indices, H is contained in the group of operators 
of the form (6.13). 


For example, the dihedral group D, operates faithfully on the vertices of a reg- 
ular p-gon, and so it is realized as a subgroup of the symmetric group Sp. It is the 
subgroup of (6.14) consisting of the matrices in which c = +1. 


Proof of the theorem. The only elements of order p of Sp are the p-cycles. So 
H contains a p-cycle, say o. We may relabel indices so that a becomes the standard 
p-cycle (add 1) = (01---(p-1)). Then this permutation generates the Sylow 
p-subgroup of H. 

Let 7, be another element of H. We have to show that 7, corresponds to an op- 
erator of the form (6.13). Say that 7, sends the index 0 to i. Since a also sends 0 to 
i, the product rt = a '7, fixes 0. It suffices to show that 7 has the form (6.13), and 
to do so, we will show that 7 is one of the operators (multiply by c). 

By hypothesis, K = {l,o,...,¢?~'} is a normal subgroup of H. Therefore 


(6.16) rT = er 
for some k between | and p-1. We now determine 7 by computing both sides of this 
equation. By Proposition (6.10), the left side is the p-cycle ror = 
(O07 17...(p-1)7), while direct computation of the right side gives o* = 
(0k 2k... (p—1)k): 

(07 Iv... (p-1)r) = (0k 2k... (p—1)k). 


We must be careful in interpreting the equality of these two cycles, because the cycle 
notation is not unique. We need to know that the first index on the left is the same as 
the first index on the right. Otherwise we will have to identify equal indices in the 
two cycles and begin with them. That is why we normalized at the start, to have 
0; = 0. Knowing that fact, the two lists are the same, and we conclude that 


17 =k, 2r = 2k, 
This is the operator (multiply by k), as claimed. 5 


We now return for a moment to the question of order of operations. If we wish 
to use the notation p(i) for permutations in this section, as we do for functions else- 
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where, we must modify our way of computing with cycles in order to take this into 
account. The most systematic way to proceed is to read everything, including cycles, 
from right to left. In other words, we should read the cycle (14387) as 


This is the inverse of the permutation (6.3). We can then interpret the product 
(14387)(548) as composition: “First apply (548), then (14387).” Computation 
of this product gives 


Jeoww 8 erwwr 7 or-ws 1, Beoww 5S eww 4 ow 3, 


which we would write as (187)(354). Notice that this is the same string of symbols 
as we obtained in (6.9). Miraculously, reading everything backward givewthe same 
answer when we multiply permutations. But of course, the notation (187)(354) now 
stands for the inverse of the permutation (6.9). The fact that the notations multiply 
consistently in our two ways of reading permutations mitigates the crime we have 
committed in switching from left to right. 


7. THE FREE GROUP 


We have seen a few groups, such as the symmetric group S;, the dihedral groups D,, 
and the group M of rigid motions of the plane, in which one can compute easily us- 
ing a list of generators and a list of relations for manipulating them. The rest of this 
chapter is devoted to the formal background for such methods. In this section. we 
consider groups which have a set of generators satisfying no relations other than ones 
[such as x(vz) = (xy)z] which are implied by the group axioms. A set S of elements 
of a group which satisfy no relations except those implied by the axioms is called 
free, and a group which has a free set of generators is called a free group. We will 
now describe the free groups. 

We start with an arbitrary set S of symbols, say S = {a,b,c,...}, which may be 
finite or infinite, and define a word to be a finite string of symbols from S, in which 
repetition is allowed. For instance a, aa, ba, and aaba are words. Two words can be 
composed by juxtaposition: 

aa, baw~~~> aaba; 


in this way the set W of all words has an associative law of composition. Moreover, 
the “empty word” can be introduced as an identity element for this law. We will 
need a symbol to denote the empty word; let us use |. The set W is called the free 
semigroup on the set of symbols S. Unfortunately it is not a group because inverses 
are lacking, and the introduction of inverses complicates things. 

Let S’ be the set consisting of the symbols in S and also of symbols a ' for 


every a Gs. 
(7.1) Siara eyo c.c!',...f. 
Let W’ be the set of words made using the symbols S$’. If a word w € W° looks 
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like 
eee KX Lees or coe x by eee 

for some x € S, then we can agree to cancel the two symbols x, x"' and reduce the 
length of the word. The word will be called reduced if no such cancellation can be 
made. Starting with any word w, we can perform a finite sequence of cancellations 
and must eventually get a reduced word wo, possibly the empty word 1. We call this 
word wo a reduced form of w. 

Now there is often more than one way to proceed with cancellation. For in- 
stance, starting with w = babb''a™'c 'ca, we can proceed in several ways, such as 


bapbp'a'c'ca  babb'a'f'f¢a 


bag 'c'ca babb '¢'¢ 
bt Ya bapp' 
i a 


The same reduced word is obtained at the end, though the letters come from differ- 
ent places in the original word. (The letters which remain at the end have been un- 
derlined.) This is the general situation. 


(7.2) Proposition. There is only one reduced form of a given word w. 


Proof. We use induction on the length of w. If w is reduced, there is nothing to 
show. If not, there must be some pair of letters which can be cancelled, say the un- 
derlined pair 


w= coe xx! eos 


(Let us allow x to denote any element of S’, with the obvious convention that if 
x = a’' then x'' = a.) If we show that we can obtain every reduced form wo of w 
by cancelling the pair xx"' first, then the proposition will follow by induction on the 
shorter word --- £#°'-++ thus obtained. 

Let wo be a reduced form of w. We know that wo is obtained from w by some 
sequence of cancellations. The first case is that our pair xx ' is cancelled at some step 
in this sequence. Then we might as well rearrange the operations and cancel xx"! 
first. So this case is settled. On the other hand, the pair xx"' can not remain in Wo, 
since Wo is reduced. Therefore at least one of the two symbols must be cancelled at 
some time. If the pair itself is not cancelled, then the first cancellation involving the 


pair must look like 
re Ppa or tag fees, 


Notice that the word obtained by this cancellation is the same as that obtained by 
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cancelling the original pair xx _'. So we may cancel the original pair at this stage in- 
stead. Then we are back in the first case, and the proposition is proved. o 


Now we call two words w,w’ in W’ equivalent, and we write w~w’, if they 
have the same reduced form. This is an equivalence relation. 


(7.3) Proposition. The product of equivalent words is equivalent: If w~w' and 
v~v’, then wo~w’v’. 


Proof. To obtain the reduced word equivalent to the product wv, we can first 
cancel as much as possible in w and in v, to reduce w to wo and v to vo. Then wv is 
reduced to wovo. Now we continue cancelling in wovo if possible. Since w'~w and 
v'~v, the same process, applied to w'v’, passes through wovo too, and hence it 
leads to the same reduced word. o 


It follows from this proposition that equivalence classes of words may be mul- 
tiplied, that is, that there is a well-defined law of composition on the set of equiva- 
lence classes of words. 


(7.4) Proposition. Let F denote the set of equivalence classes of words in W’. 
Then F is a group with the law of composition induced from W’. 


Proof. The facts that multiplication is associative and that the class of the 
empty word | is an identity follow from the corresponding facts in W’. It remains to 
check that all elements of F are invertible. But clearly, if w = xy---z then the class 
of z'--- y"'x"! is the inverse of the class of w. o 


(7.5) Definition. The group F of equivalence classes of words is called the free 
group on the set S. 


So an element of the free group F corresponds to exactly one reduced word in 
W', by Proposition (7.2). To multiply reduced words, combine and cancel: 


(abc"')(cb) ~~» abc" 'cb = abb. 


One can also introduce power notation for reduced words: aaab™'b™'! = a3b”. 

The free group on the set S = {a} consisting of one element is the same as the 
set of all powers of a: F = {a"}. It is an infinite cyclic group. In contrast, the free 
group on a set S = {a,b} of two elements is very complicated. 


8. GENERATORS AND RELATIONS 


Having described free groups, we now consider the more likely case that a set of 
generators of a group is not free—that there are some nontrivial relations among 
them. Our discussion is based on the mapping properties of the free group and of 
quotient groups. 
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(8.1) Proposition. Mapping property of the free group: Let F be the free group on 
aset S = {a,b,...}, and let G be a group. Every map of sets f: S——> G extends in a 
unique way to a group homomorphism ¢: F —> G. If we denote the image f(x) of 
an element x € S by £, then g sends a word in S’ = {a,a'',b,b"',...} to the corre- 
sponding product of the elements {a,a°',b, b™',...} in G. 


Proof. This rule does define a map on the set of words in S’. We must show 
that equivalent words are sent to the same product in G. But since cancellation in a 
word will not change the corresponding product in G, this is clear. Also, since mul- 
tiplication in F is defined by juxtaposition, the map ¢ thus defined is a homomor- 
phism. It is the only way to extend f to a homomorphism. o 


If S is any subset of a group G, the mapping property defines a homomorphism 
gy: F —— G from the free group on S to G. This reflects the fact that the elements of 
S satisfy no relations in F except those implied by the group axioms, and explains 
the reason for the adjective. free. 

A family S of elements is said to generate a group G if the map ¢ from the free 
group on S to G is surjective. This is the same as saying that every element of G is a 
product of some string of elements of S’, so it agrees with the terminology intro- 
duced in Section 2 of Chapter 2. In any case, whether or not S generates G, the im- 
age of the homomorphism ¢ of Proposition (8.1) is a subgroup called the subgroup 
generated by S. This subgroup consists precisely of all products of elements of S’. 

Assume that S generates G. The elements of S are then called generators. 
Since ¢ is a surjective homomorphism, the First Isomorphism Theorem [Chapter 2 
(10.9)] tells us that G is isomorphic to the quotient group F/N, where N = ker g. 
The elements of N are called relations among the generators. They are equivalence 
classes of words w with the property that the corresponding product in G is 1: 


g(w)=1 or w=1inG. 


In the special case that N = {1}, ¢ is an isomorphism. In this case G is called a free 
group too. 

If we know a set of generators and also all the relations, then we can compute 
in the isomorphic group F/N and hence in our group G. But the subgroup N will be 
infinite unless G is free, so we can’t list all its elements. Rather, a set of words 


R = {r,,1r2,...} 


is called a set of defining relations for G if R C N and if N is the smallest normal 
subgroup containing R. This means that N is generated by the subset consisting of all 
the words in R and also all their conjugates. 

It might seem more systematic to require the defining relations to be generators 
for the group N. But remember that the kernel of the homomorphism F —>G 
defined by a set of generators is always a normal subgroup, so there is no need to 
make the list of defining relations longer. If we know that some relation r = 1-holds 
in G, then we can conclude that xrx"' = 1 holds in G too, simply by multiplying 


both sides of the equation on the left and right by x and x"'. 
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We already know a few examples of generators and relations, such as the dihe- 
dral group D, [Chapter 5 (3.6), (3.7)]. It is generated by the two elements x, y, with 
relations 


(8.2) x7=1, y?=1, xyxy = 1. 


(8.3) Proposition. The elements x”, y*, xyxy form a set of defining relations for 
the dihedral group. 


This proposition is essentially what was checked in Chapter 5 (3.6). But to 
prove it formally, and to work freely with the concept of generators and relations, 
we will need what is called the mapping property of quotient groups. It is a general- 
ization of the First Isomorphism Theorem: 


(8.4) Proposition. Mapping property of quotient groups: Let N be a normal sub- 
group of iG, let G = G/N, and let a be the canonical map G——> CG defined by 

m(a) = a = aN. Let g: G—~>G’ be a homomorphism whose kernel contains N. 
There is a unique ime romeacs ¢g: G—>G' such thal 97 = ¢: 


6 ae Gi 


Nf 


This map is defined by the rule 9(@) = ae 


Proof. To define a map ¢: G——> G', we must define O(a) for every element 
a of G. To do this, we represent a by an element a € G, choosing a so that 
a = 7(a). In the bar notation, this means that a = a. Now since we want our map 
@ to satisfy the relation O(a (a)) = g(a), there is no choice but to define @ by the 
rule (a) = g(a), as asserted in the proposition. To show that this is permissible, 
we must show that the value we obtained for @(a), namely g(a), depends only on a 
and not on our choice of the representative a. This is often referred to as showing 
that our map is “well-defined.” 

Let a and a’ be two elements of G such that a = @’ = a. The equality a = @' 
means that aN = a'N, or [Chapter 2 (5.13)] that a’ € aN. So a’ = an for some 
n EN. Since N C ker @ by hypothesis, y(n) = 1. Thus g(a’) = g(a)g(n) = 
g(a), as required. 

Finally, the map @ is a homomorphism because G(aelb) = v(a)p(b) = 


g(ab) = Gab). o 


Proof of Proposition (8.3). We showed in Chapter 5 (3.6) that D,, is generated 
by elements x,y which satisfy (8.2). Therefore there is a surjective map ¢: 
F— D, from the free group on x, y to D,, and R = {x”, y’, xyxy} is contained in 
ker y. Let N be the smallest normal subgroup of F containing R. Then since ker ¢ is 
a normal subgroup which contains R, N C ker y. The mapping property of quo- 
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tients gives us a homomorphism @: F/N —— Dy. If we show that ¢ is bijective, the 
proposition will be proved. 

Note that since ¢ is surjective, @ is too. Also, in F/N the relations x" = 1, 
y? = 1, and xyxy = 1 hold. Using them, we can put any word in x, y into the form 
x‘y/, withO <i <n — 1and0 < j S 1. This shows that F/N has at most 2n ele- 
ments. Since ar = 2n, it follows that @ is bijective, as required. o 


We will use the notation 
(8.5) (xi)... one) 


to denote the group generated by elements x;,...,%m, with defining relations 
Pis0.25 hee VMS 


(8.6) : Dy =e; x" V2) 


As a new example, let us consider the group generated by x, y, with the single 
relation xyx'y"' = 1. If x, y are elements of a group, then 


(8.7) yey 


is called their commutator. This commutator is important because it is equal to 1| if 
and only if x and y commute. This is seen by multiplying both sides of the equation 
xyx 'y”' = 1 on the right by yx. So if we impose the relation xyx'y"' = 1 on the 
free group, we will obtain a group in which x and y commute. Thus if N is the 
smallest normal subgroup containing the commutator xyx 'y'' and if G = F/N, 
then the residues of x and y are commuting elements of G. This forces any two ele- 
ments of G to commute. 


(8.8) Proposition. Let F be the free group on x, y and let N be the smallest nor- 
mal subgroup generated by the commutator xyx"'y'. The quotient group G = F/N 
is abelian. 


Proof. Let us denote the residues of the generators x, y in G by the same let- 
ters. Since the commutator is in NV, the elements x,y commute in G. Then x com- 
mutes with y"' too. For xy ' and y 'x both become equal to x when multiplied on the 
left by y. So by the Cancellation Law, they are equal. Also, x obviously commutes 
with x and with x’. So x commutes with any word in S' = {x,x7', y, y'}. So does 
y. It follows by induction that any two words in S’ commute. Since x,y generate the 
group, G is commutative. a 


Note this consequence: The commutator uvu™'v"' of any two words in S’ is in 
the normal subgroup generated by the single commutator xyx 'y"', because, since 
u,v commute in G, the commutator represents the identity element in G. 

The group G constructed above is called the free abelian group on the set 
{x,y}, because the elements x,y satisfy no relations except those implied by the 
group axioms and the commutative law. 

In the examples we have seen, knowledge of the relations allows us to compute 
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easily in the group. This is somewhat misleading, because computation with a given 
set of relations is often not easy at all. For example, suppose that we change the 
defining relations (8.6) for the dihedral group slightly, substituting y* for y?: 


(8.9) G = (x,y; x", y?, xyxy). 


This group is much more complicated. When n > 5, it is an infinite group. 

Things become very difficult when the relations are complicated enough. Sup- 
pose that we are given a set R of words, and let N be the smallest normal subgroup 
containing R. Let w, w' be any other words. Then we can pose the problem of de- 
ciding whether or not w and w’ represent the same element of F/N. This is called 
the word problem for groups, and it is known that there is no general procedure for 
deciding it in a predictable length of time. Nevertheless, generators and relations al- 
low efficient computation in many cases, and so they are a useful tool. We will dis- 
cuss an important method for computation, the Todd—Coxeter Algorithm, in the 
next section. 

Recapitulating, when we speak of a group defined by generators S and relations 
R, we mean the quotient group F/N, where F is the free group on S and N is the 
smallest normal subgroup of F containing R. Note that any set R of relations will 
define a group, because F/N is always defined. The larger R is, the larger N becomes 
and the more collapsing takes place in the homomorphism 7: F —> F'/N. If R gets 
“too big,” the worst that can happen is that N = F, hence that F/N is the trivial 
group. Thus there is no such thing as a contradictory set of relations. The only prob- 
lems which may arise occur when F/N becomes too small, which happens when the 
relations cause more collapsing than was expected. 


9. THE TODD-COXETER ALGORITHM 


Let H be a subgroup of a finite group G. The Todd—Coxeter Algorithm which is de- 
scribed in this section is an amazing direct method of counting the cosets of H in G 
and of determining the operation of G on the set of cosets. Since we know that any 
operation on an orbit looks like an operation on cosets [Chapter 5 (6.3)], the al- 
gorithm is really a method of describing any group operation. 

In order to compute explicitly, both the group G and the subgroup H must be 
given to us in an explicit way. So we consider a group 


(9.1) C= Gee 


presented by generators x,,...,%m and explicitly given relations ri,...,7x, as in the 
previous section. Thus G is realized as the quotient group F/N, where F is the free 
group on the set {x;,...,%m} and N is the smallest normal subgroup containing 
{r,,..., 17k}. We also assume that the subgroup H of G is given to us explicitly by a set 
of words 


(9.2) {hy,..., hs} 


in the free group F, whose images in G generate H. 
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Let us work out a specific example to begin with. We take for G the group gen- 
erated by three elements x,y,z, with relations x*, y’,z’, xyz, and for H the cyclic 
subgroup generated by z: 


(9.3) G = (x,y,z; x3, y?,2z7, xyz), H = {z}. 


Since we will be determining the operation on cosets, which is a permutation 
representation [Chapter 5 (8.1)], we must decide how to write permutations. We 
will use the cycle notation of Section 6. This forces us to work with right cosets Hg 
rather than with left cosets, because we want G to operate on the right. Let us denote 
the set of right cosets of H in G by &. We must also decide how to describe the op- 
eration of our group explicitly, and the easiest way is to go back to the free group 
again, that is, to describe the permutations associated to the given generators x, y, z. 

The operations of the generators on the set of cosets will satisfy these rules: 


(9.4) Rules. 


1. The operation of each generator (x,y,z in our example) is a permutation. 
2. The relations (x*, y’,z*, xyz in our example) operate trivially. 

3. The generators of H (z in our example) fix the coset H1. 

4. The operation on cosets is transitive.. 


The first rule is a general property of group operations. It follows from the fact that 
group elements are invertible. We list it instead of mentioning inverses of the gener- 
ators explicitly. The second rule holds because the relations represent | in G, and it 
is the group G which operates. Rules 3 and 4 are special properties of the operation 
on cosets. 

We now determine the coset representation by applying only these rules. Let us 
use indices 1, 2,3,... to denote the cosets, with 1 standing for the coset H1. Since 
we don’t know how many cosets there are, we don’t know how many indices we 
need. We will add new ones as necessary. 

First, Rule 3 tells us that z sends 1 to itself: 1z = 1. This exhausts the informa- 
tion in Rule 3, so Rules | and 2 take over. Rule 4 will appear only implicitly. 

We don’t know what x does to the index 1. Let’s guess that 1x # 1 and assign 
a new index, say Ix = 2. Continuing with the generator x, we don’t know 2x, so 
we assign a third index: 1x? = 2x = 3. Rule 2 now comes into play. It tells us that 
x’ fixes every index. Therefore 1x* = 3x = 1. It is customary to sum up this infor- 
mation in a table 


which exhibits the operation of x on the three indices. The relation xxx appears on 
the top, and Rule 2 is reflected in the fact that the same index 1 appears at both ends. 
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At this point, we have determined the operation of x on the three indices 1, 2,3, ex- 
cept for one thing: We don’t yet know that these indices represent distinct cosets. 

We now ask for the operation for y on the index 1. Again, we don’t know it, so 
we assign a new index, say ly = 4. Rule 2 applies again. Since y’ operates trivially, 
we know that ly? = 4y = 1: 


The remaining relation is xvz. We know that 1x = 2, but we don’t yet know 
2y. So we set Ixy = 2y = 5S. Rule 2 then tells us that Ixyz = 5z = 1: 


We now apply Rule 1: The operation of each group element is a permutation of the 
indices. We have determined that 1z = 1 and also that 5z = 1. It follows that 
= 1. We eliminate the index 5, replacing it by 1. This in turn tells us that 2y = 1. 
On the other hand, we have already determined that 4y = 1. So 4 = 2 by Rule 1, 
and we eliminate 4. 
The entries in the table below have now been determined: 


me BR 8S y y Zaz 55, ee VFA 
1 2 3 1 2 i 1 1 2 1 1 
2 3 1 2 1 2 23 2 
3 1 2 3 3 3 1 22 3. 


The bottom right corner shows that 2z = 3. This determines the rest of the table. 
There are three indices, and the operation is 


x = (123), y = (12), z = (23). 


Since there are three indices, we conclude that there are three cosets and that 
the index of H in G is 3. We also conclude that the order of H is 2, and hence that G 
has order 6. For z* = | is one of our relations; therefore z has order | or 2, and 
since z does not operate trivially on the indices, z # 1. The three permutations listed 
above generate the symmetric group, so the permutation representation is an isomor- 
phism from G onto $3. 

Of course, these conclusions depend on our knowing that the permutation rep- 
resentation we have constructed is the right one. We will show this at the end of the 
section. Let’s compute a few more examples first. 


(9.5) Example. Consider the tetrahedral group 7 of the 12 rotational symmetries of 
a regular tetrahedron (see Section 9 of Chapter 5). If we let y and x denote counter- 
clockwise rotations by 27/3 about a vertex and the center of a face as shown below, 
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then-yx = z is the rotation by 7 about an edge. Thus the relations 
(9.6) xe=1,y? = 1, yxyx = 1 
hold in T. 


“ 


Q 
iG 


SS 


Vv 


Let us show that (9.6) is a complete set of relations for T. To do so, we con- 
sider the group G = (y, x; y*,x°, yxyx) defined by these relations. Since the rela- 
tions (9.6) hold in 7, the mapping property of quotient groups provides a homomor- 
phism gy: G-———>T. This map is surjective because, as is easily seen, y and x 
generate JT. We need only show that ¢ is injective. We will do this by showing that 
the order of the group G is 12. 

It is possible to analyze the relations directly, but they aren’t particularly easy 
to work with. We could also compute the order of G by enumerating the cosets of 
the trivial subgroup H = {1}. This is not efficient either. It is better to use a nontriv- 
ial subgroup H of G, such as the one generated by y. This subgroup has order at 
most 3 because y* = 1. If we show that its order is 3 and that its index in G is 4, it 
will follow that G has order 12, and we will be done. 

Here is the resulting table. To fill it in, work from both ends of the relations. 


2 
3 
1 
4 


Thus the permutation representation is 
(9.7) x = (123), y = (234). 


Since there are four indices, the index of H is 4. Also, notice that y does have order 
precisely 3. For since y* = 1, the order is at most 3, and since the permutation 
(234) associated to y has order 3, it is at least 3. So the order of the group is 12, as 
predicted. Incidentally, we can derive the fact that T is isomorphic to the alternating 
group A, by verifying that the permutations (9.7) generate that group. o 
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(9.8) Example. We modify the relations (9.6) slightly. Let G be generated by x, y, 
with relations 


x? = 1, y° = 1, yxy2x = 1, 


and let H be the subgroup generated by y. Here is a start for a table. Since y* = 1, 
we have shortened the last relation, substituting y"' for y*. Clearly, y"' acts as the 
inverse of the permutation associated to y. The entries in the bottom row have been 
determined by working from the right side. 


— 
N 
tod 
= 
_ 
fot 
x 
4 
oo 
Ne 
Coo 
— 


2 4 2 3 1 1 2 


We rewrite the relation 2y ' = 3 as 3y = 2. Since 2y = 3 as well, it follows that 
3y? = 3 and that 3y* = 2. But y? = 1, so3 = 2, which in turn implies 1 = 2 = 3. 
Since the generators x,y fix 1, there is one coset, and H = G. Therefore x is a 
power of y. The third relation shows that x? = 1. Combining this fact with the first 
relation, we find x = 1. Thus G is a cyclic group of order 3. This example illustrates 
how relations may collapse the group. o 


In our examples, we have taken for H the subgroup generated by one of the 
chosen generators of G, but we could also make the computation with a subgroup H 
generated by an arbitrary set of words. They must be entered into the computation 
using Rule 3. 

This method can also be used when G is infinite, provided that the index 
[G:H] is finite. The procedure can not be expected to terminate if there are 
infinitely many cosets. 

We now address the question of why the procedure we have described does 
give the operation on cosets. A formal proof of this fact is not possible without first 
defining the algorithm formally, and we have not done this. So we will discuss the 
question informally. We describe the procedure this way: At a given stage of the 
computation, we will have some set I of indices, and the operation of some genera- 
tors of the group on some indices will have been determined. Let us call this a par- 
tial operation on I. A partial operation need not be consistent with Rules 1, 2, and 3, 
but it should be transitive; that is, all indices should be in the “partial orbit” of 1. 
This is where Rule 4 comes in. It tells us not to introduce any indices we don't need. 

The starting position is I = {1}, with no operations assigned. At any stage 
there are two possible steps: 


(9.9) 


(i) We may equate two indices i,j © I as a consequence of one of the first three 
rules, or 

(ii) we may choose a generator x and an index i such that ix has not yet been deter- 
mined and define ix = j, where j is a new index. 
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We stop the process when an operation has been determined which is consistent with 
the rules, that is, when we have a complete, consistent table and the rules hold. 

There are two questions to ask: First, will this procedure terminate? Second, if 
it terminates, is the operation the right one? The answer to both questions is yes. It 
can be shown that the process always terminates, provided that the group is: finite 
and that preference is given to Step (i). We will not prove this. The more important 
fact for applications is that if the process terminates, the resulting permutation repre- 
sentation is the right one. 


(9.10) Theorem. Suppose that a finite number of repetitions of Steps (i) and (ii) 
yields a consistent table. Then the table defines a permutation representation which 
is isomorphic, by suitable numbering, to the representation on cosets. 


Sketch of proof. Let 1* denote the final set of indices, with its operation. We 
will prove the proposition by defining a bijective map ¢*: I*——> © from this set to 
the set of cosets which is compatible with the two operations. We define ¢* induc- 
tively, by defining at each stage a map y: I——> from the set of indices deter- 
mined at that stage to ‘€, such that @ is compatible with the partial operation on I. To 
start, {1} > ‘€ sends 1~~~H1. Now suppose that ¢: I——> © has been defined, 
and let I’ be the result of applying one of Steps (9.9) to I. In case of Step (ii), there 
is no difficulty in extending g to a map og’: I—~>€. We simply define 
g'(k) = ¢(k) if k # j, and ¢'(j) = e(i)x. Next, suppose that we use Step (ii) to 
equate two indices, say i, j, so that I is collapsed to form the new index set I’. Then 
the next lemma allows us to define the map gy’: I’—> €: 


(9.11) Lemma. Suppose that a map g: I——> is given, compatible with a par- 
tial operation on I. Let i,j © I, and suppose that one of the Rules 1, 2, or 3 forces 
i = j. Then ¢(i) = ¢()j). 


Proof. This is true because, as we have already remarked, the operation on 
cosets does satisfy all of the Rules (9.4). So if the rules force i = j, they also force 
eli) = ¢(j). 0 


It remains to prove that the map y*: I* > € is bijective. To do this, we 
construct the inverse map y*: €—— I*, using the following lemma: 


(9.12) Lemma. Let S be a set on which G operates, and let s € S be an element 
stabilized by H. There is a unique map w: “6 ——> S which is compatible with the op- 
erations on the two sets and which sends H1 ~~» s. 


Proof. This proof repeats that of (6.4) in Chapter 5, except that we have 
changed to right operations. Since g sends H~~~ Hg and since we want w(Hg)= 
w(H)g, we must try to set W(Hg) = sg. This proves uniqueness of the map y. To 
prove existence, we first check that the rule w(Hg) = sg is well-defined: If 
Ha = Hb, then ba"' € H. By hypothesis, ba™! stabilizes s, so sa = sb. Finally, w is 
compatible with the operations of G because (Hga) = sga = (sg)a = W(Hg)a. o 
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Now, to prove the bijectivity of w*, we use the lemma to construct a map 
yy *: €—> I*. Consider the composed map y*y*: &—>. It sends Hl ~~ H1. 
We apply the lemma again, substituting for S. The uniqueness assertion of the 
lemma tells us that ¢*w * is the identity map. On the other hand, since the operation 
on I¥ is transitive and since w* is compatible with the operations. y* must be sur- 
Jective. It follows that g* and w* are bijective. 


The axiomatic method has many advantages over honest work. 


Bertrand Russell 


EXERCISES 


I, The Operations of a Group on itself 


1. Does the rule g, x ~~» xg | define an operation of G on itself? 

2. Let H be a subgroup of a group G. Then H operates on G by left multiplication. Describe 
the orbits for this operation. 

3. Prove the formula |G| = |Z| + %|C|. where the sum is over the conjugacy classes con- 
taining more than one element and where Z is the center of G. 

4. Prove the Fixed Point Theorem (1.12). 

5. Determine the conjugacy classes in the group M of motions of the plane. 


6. Rule out as many of the following as possible as Class Equations for a group of order 10: 
Piel 2, ee tr Se ee +4 a ot 2. 


7. Let F = F;. Determine the order of the conjugacy class of | | in GL2(Fs). 


8. Determine the Class Equation for each of the following groups. 
(a) the quaternion group, (b) the Klein four group, (c) the dihedral group Ds, 
(d) De, (e) Dn, (f) the group of upper triangular matrices in GL2(F,), 
(g) SL2(F3). 

9. Let G be a group of order n, and let F be any field. Prove that G is isomorphic to a sub- 
group of GL, (F). 

10. Determine the centralizer in GL3(R) of each matrix. 
1 1 Ieeal ss 

(a) 2 (b) I (c) 1 (d) H 


(e) 1] (f) ] 
! 1 
*11. Determine all finite groups which contain at most three conjugacy classes. 
12. Let N be a normal subgroup of a group G. Suppose that | N| = 5 and that |G| is odd. 
Prove that N is contained in the center of G. 
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*13. (a) Determine the possible Class Equations for groups of order 8. 
(b) Classify groups of order 8. 


14. Let Z be the center of a group G. Prove that if G/Z is a cyclic group, then G is abelian 
and hence G = Z. 
*15, Let G be a group of order 35. 
(a) Suppose that G operates nontrivially on a set of five elements. Prove that G has a 
normal subgroup of order 7. 
(b) Prove that every group of order 35 is cyclic: 


2. The Class Equation of the Icosahedral Group 


1. Identify the intersection 7 M O when the dodecahedron and cube are as in Figure (2.7). 


2. Two tetrahedra can be inscribed into a cube C, each one using half the vertices. Relate 
this to the inclusion Ay C Sq. 


3. Does / contain a subgroup T? De? D3? 
4. Prove that the icosahedral group has no subgroup of order 30. 
5. Prove or disprove: As is the only proper normal subgroup of Ss. 
6. Prove that no group of order p°, where p is prime and e > 1, is simple. 
7. Prove or disprove: An abelian group is simple if and only if it has prime order. 
8. (a) Determine the Class Equation for the group T of rotations of a tetrahedron. 
(b) What is the center of 7? 
(c) Prove that T has exactly one subgroup of order 4. 
(d) Prove that T has no subgroup of order 6. 
9. (a) Determine the Class Equation for the octahedral group O. 
(b) There are exactly two proper normal subgroups of O. Find them, show that they are 
normal, and show that there are no others. 

10. Prove that the tetrahedral group T is isomorphic to the alternating group A4, and that the 
octahedral group O is isomorphic to the symmetric group S,. Begin by finding sets of 
four elements on which these groups operate. 

11. Prove or disprove: The icosahedral group is not a subgroup of the group of real upper tri- 
angular 2 X 2 matrices. 


*12. Prove or disprove: A nonabelian simple group can not operate nontrivially on a set con- 
taining fewer than five elements. 


3. Operations on Subsets 


1. Let S be the set of subsets of order 2 of the dihedral group D,. Determine the orbits for 
the action of D3 on S by conjugation. 


2. Determine the orbits for left multiplication and for conjugation on the set of subsets of 
order 3 of D3. 


3. List all subgroups of the dihedral group D,, and divide them into conjugacy classes. 


4. Let H be a subgroup of a group G. Prove that the orbit of the left coset gH for the opera- 
tion of conjugation contains the right coset Hg. 


5. Let U be a subset of a finite group G, and suppose that |U| and |G| have no common 
factor. Is the stabilizer of | U | trivial for the operation of conjugation? 


6. Consider the operation of left multiplication by G on the set of its subsets. Let U be a 
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a) 


i. 


13% 


*14. 


IES, 


16. 


i. 


18. 


subset whose orbit {¢U} partitions G. Let H be the unique subset in this orbit which con- 
tains |. Prove that H is a subgroup of G and that the sets gU are its left cosets. 


. Let H be a subgroup of a group G. Prove or disprove: The normalizer N (H) is a normal 


subgroup of the group G. 


- Let H C K C G be groups. Prove that H is normal in K if and only if K C N(H). 
. Prove that the subgroup B of upper triangular matrices in GL,(R) is conjugate to the 


group L of lower triangular matrices. 


- Let B be the subgroup of G = GL,(C) of upper triangular matrices, and let U C B be 


the set of upper triangular matrices with diagonal entries 1. Prove that B = N(U) and 

that B = N(B). 

Let S, denote the subgroup of GL, (IR) of permutation matrices. Determine the norma!- 

izer of S, in GL,(R). 

Let S be a finite set on which a group G operates transitively, and let U be a subset of 5S. 

Prove that the subsets gU cover S evenly, that is, that every element of S is in the same 

number of sets gU. 

(a) Let H be a normal subgroup of G of order 2. Prove that H is in the center of G. 

(b) Let H be a normal subgroup of prime order p in a finite group G. Suppose that p is 
the smallest prime dividing |G|. Prove that H is in the center Z(G). 

Let H be a proper subgroup of a finite group G. Prove that the union of the conjugates of 

H is not the whole group G. 

Let K be a normal subgroup of order 2 of a group G, and let G = G/K. Let C be acon- 

jugacy class in G. Let S be the inverse image of C in G. Prove that one of the following 

two Cases occurs. 

(a) S = C isa single conjugacy class and |C| = 2|C|. 7 

(b) S = C, U C, is made up of two conjugacy classes and |C;| = |C2| = |C|. 

Calculate the double cosets HgH of the subgroup H = {1, y} in the dihedral group Dp. 

Show that each double coset has either two or four elements. 

Let H, K be subgroups of G, and let H’ be a conjugate subgroup of H. Relate the double 

cosets H’gK and HeK. 

What can you say about the order of a double coset HgK? 


4. The Sylow Theorems 


1. 
2. 
3. 


4. 


=. 


How many elements of order 5 are contained in a group of order 20? 
Prove that no group of order pg, where p and q are prime, is simple. 
Prove that no group of order p*q, where p and q are prime, is simple. 


‘Prove that the set of matrices p ‘| where a,c € F7 and c = 1,2,4 forms a group of 


the type presented in (4.9b) (and that therefore such a group exists). 


Find Sylow 2-subgroups in the following cases: 
(a) Dio (b)T (©)O (d)I. 


6. Find a Sylow p-subgroup of GL,(F,). 


ie 


(a) Let H be a subgroup of G of prime index p. What are the possible numbers of conju- 


gate subgroups of H? i 
(b) Suppose that p is the smallest prime integer which divides |G |. Prove that H is a nor- 


mal subgroup. 
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412, 


6. 
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. Let H be a Sylow p-su group of G, and let K = N(H). Prove or disprove: K = N(K). 
. Let G be a group of order p“m. Prove that G contains a subgroup of order p’ for every 
integerr =e. 
. Let n = pm be an integer which is divisible exactly once by p, and let G be a group 
of order n. Let H be a Sylow p-subgroup of G, and let S be the set of all Sylow p- 
subgroups. How does S decompose into H-orbits? 
(a) Compute the order of GLp(Fp). 
(b) Find a Sylow p-subgroup of GL,(F,). 
(c) Compute the number of Sylow p-subgroups. 
(d) Use the Second Sylow Theorem to give another proof of the First Sylow Theorem. 
Prove that no group of order 224 is simple. 
. Prove that if G has order n = p“a where 1 = a < p and e = 1, then G has a proper 
normal subgroup. 
. Prove that the only simple groups of order < 60 are groups of prime order. 
. Classify groups of order 33. 
. Classify groups of order 18. 
. Prove that there are at most five isomorphism classes of groups of order 20. 
. Let G be a simple group of order 60. 
(a) Prove that G contains six Sylow 5-subgroups, ten Sylow 3-subgroups, and five Sylow 
2-subgroups. 
(b) Prove that G is isomorphic to the alternating group As. 


The Groups of Order 12 


. Determine the Class Equations of the groups of order 12. 
. Prove that a group of order n = 2p, where p is prime, is either cyclic or dihedral. 
. Let G be a group of order 30. 
(a) Prove that either the Sylow 5-subgroup K or the Sylow 3-subgroup H is normal. 
(b) Prove that HK is a cyclic subgroup of G. 
(c) Classify groups of order 30. 
. Let G be a group of order 55. 
(a) Prove that G is generated by two elements x,v, with the relations x'' = 1, y5 = 1, 
yxy ' =x’, forsomer, | =r < 11. 
(b) Prove that the following values of r are not possible: 2,6, 7,8, 10. 
(c) Prove that the remaining values are possible, and that there are two isomorphism 
classes of groups of order 55. 


Computation in the Symmetric Group 


. Verify the products (6.9). 
- Prove explicitly that the permutation (1 23)(45) is conjugate to (241)(35). 
. Let p,q be permutations. Prove that the products pq and qp have cycles of equal sizes. 


. (a) Does the symmetric group S7 contain an element of order 5? of order 10? of order 


15? 
(b) What is the largest possible order of an element of $7? 
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Show how to determine whether a permutation is odd or even when it is written as a 
product of cycles. 


Prove or disprove: The order of a permutation is the least common multiple of the orders 
of the cycles which make it up. 


Is the cyclic subgroup H of S, generated by the cycle (12345) a normal subgroup? 
Compute the number of permutations in S, which do not leave any index fixed. 


. Determine the cycle decomposition of the permutation i ~~» n-i. 


(a) Prove that every permutation p is a product of transpositions. 

(b) How many transpositions are required to write the cycle (123--:n)? 

(c) Suppose that a permutation is written in two ways as a product of transpositions, say 
P = 7\T2°*'T and p = 7)'T2' +++ Tn’. Prove that m and n are both odd or else they are 
both even. 

What is the centralizer of the element (12) of S4? 

Find all subgroups of order 4 of the symmetric group S,. Which are normal? 

Determine the Class Equation of Ag. 

(a) Determine the number of conjugacy classes and the Class Equation for Ss. 

(b) List the conjugacy classes in As, and reconcile this list with the list of conjugacy 
classes in the icosahedral group [see (2.2)]. 

Prove that the transpositions (12), (23),...,(n-1,m) generate the symmetric group S,. 

Prove that the symmetric group S, is generated by the cycles (12---n) and (1 2). 

(a) Show that the product of two transpositions (ij)(KI) can always be written as a 
product of 3-cycles. Treat the case that some indices are equal too. 

(b) Prove that the alternating group A, is generated by 3-cycles, ifn = 3. 

Prove that if a proper normal subgroup of S, contains a 3-cycle, it is An. 

Prove that A, is simple for all n = 5. 

Prove that A, is the only subgroup of S, of index 2. 

Explain the miraculous coincidence at the end of the section in terms of the opposite 

group (Chapter 2, Section 1, exercise 12). 


7. The Free Group 


ie 


2: 


Prove or disprove: The free group on two generators is isomorphic to the product of two 

infinite cyclic groups. 

(a) Let F be the free group on x,y. Prove that the two elements u = x? and v = y® gen- 
erate a subgroup of F which is isomorphic to the free group on uy, v. 

(b) Prove that the three elements u = x*, v = y’, and z = xy generate a subgroup iso- 
morphic to the free group on 4, v, z. 


. We may define a closed word in S’ to be the oriented loop obtained by joining the ends 


of a word. Thus 


bine 


represents a closed word, if we read it clockwise. Establish a bijective correspondence 
between reduced closed words and conjugacy classes in the free group. 
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4. Let p be a prime integer. Let N be the number of words of length p in a finite set S. 
Show that N is divisible by p. 


8 Generators and Relations 


1. Prove that two elements a, b of a group generate the same subgroup as bab’, bab’. 

2. Prove that the smallest normal subgroup of a group G containing a subset S is generated 
as a subgroup by the set {gse '| g © G,s € S}. 

3. Prove or disprove: y*x’ is in the normal subgroup generated by xy and its conjugates. 


5 


4. Prove that the group generated by x, y,z with the single relation yxyz * = | is actually a 
free group. 

5. Let S be a set of elements of a group G, and let {r,} be some relations which hold among 
the elements S in G. Let F be the free group on S. Prove that the map F——>G (8.1) 
factors through F/N, where N is the norma! subgroup generated by {ri}. 

6. Let G be a group with a normal subgroup N. Assume that G and G/N are both cyclic 
groups. Prove that G can be generated by two elements. 


7. A subgroup H of a group G is called characteristic if it is carried to itself by all automor- 
phisms of G. 
(a) Prove that every characteristic subgroup is normal. 
(b) Prove that the center Z of a group G is a characteristic subgroup. 
(c) Prove that the subgroup H generated by all elements of G of order n is characteristic. 

8. Determine the normal subgroups and the characteristic subgroups of the quaternion 
group. 

9. The commutator subgroup C of a group G is the smallest subgroup containing all 
commutators. 
(a) Prove that the commutator subgroup is a characteristic subgroup. 
(b) Prove that G/C is an abelian group. 

10. Determine the commutator subgroup of the group M of motions of the plane. 


11. Prove by explicit computation that the commutator x(yz)x '(yz) ' is in the normal sub- 
group generated by the two commutators xyx 'y"' and xzx''z'' and their conjugates. 

12. Let G denote the free abelian group (x, y; xyx 'y ') defined in (8.8). Prove the universal 
property of this group: If u,v are elements of an abelian group A, there is a unique 
homomorphism ¢: G——>A such that g(x) = u, o(y) = v. 

13. Prove that the normal subgroup in the free group (x, y) which is generated by the single 
commutator xyx"'y”' is the commutator subgroup. 

14. Let N be a normal subgroup of a group G. Prove that G/N is abelian if and only if N 
contains the commutator subgroup of G. 

15. Let ¢: G——>G’' be a surjective group homomorphism. Let S be a subset of G such that 
¢(S) generates G’, and let T be a set of generators of ker g. Prove that § U T generates 
Ge 


16. Prove or disprove: Every finite group G can be presented by a finite set of generators and 
a finite set of relations. 


17. Let G be the group generated by x,y,z, with certain relations {r;}. Suppose that one of 
the relations has the form wx, where w is a word in y, z. Let 7;’ be the relation obtained 
by substituting w ' for x into r;, and let G’ be the group generated by y, z, with relations 
{r;'}. Prove that G and G’ are isomorphic. 
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9, The Todd—Coxeter Algorithm 


1. 


*6. 
. Let G be the group generated by elements x, y, with relations x* = 1, y? = 1, x? = yxy. 


10. 


Prove that the elements x, v of (9.5) generate T, and that the permutations (9.7) generate 
Ag. 


. Use the Todd—Coxeter Algorithm to identify the group generated by two elements x, y, 


with the following relations. 


(a) x? = y? = 1, xvx = yxy 
(b) x? = y* = 1, xyx = yxy 
(c) x* = y? = 1, xyx = yxy 
(d) x* = y* = 1, xyx = yxy 
(e) x* = y* = x*y? = 1 


. Use the Todd—Coxeter Algorithm to determine the order of the group generated by x, y, 


with the following relations. 
(a) x*t=1,y2=I1,xy = yx (b) x’ = 1, y2 = 1, yx = xy. 


. Identify the group G generated by elements x,y,z, with relations x* = y* = z> = 


x?z* = 1 andz = xy. 


. Analyze the group G generated by x,y, with relations x* = 1, y* = 1, x? = y?, 


xy ver. 
Analyze the group generated by elements x, y, with relations x 'yx = y', y 'xy = x"!. 


Prove that this group is trivial in these two ways. 
(a) using the Todd—Coxeter Algorithm 
(b) working directly with the relations 


. Identify the group G generated by two elements x,y, with relations x* = y? = 


yxyxy = 1. 


. Let p = q Sr be integers >1. The triangle group G4 is defined by generators 


GP = (x,y,z, x", y4,z", xyz). In each case, prove that the triangle group is isomorphic 
to the group listed. 

(a) the dihedral group D,, when p,g,r = 2,2,n 

(b) the tetrahedral group, when p,q,r = 2,3,3 
(c) the octahedral group, when p,q,r = 2,3,4 
(d) the icosahedral group, when p,q,r = 2,3,5 

Let A denote an isosceles right triangle, and let a, b,c denote the reflections of the plane 
about the three sides of A. Let x = ab, y = hc, z = ca. Prove that x, y, z generate a tri- 


angle group. 


11. (a) Prove that the group G generated by elements x,y,z with relations x? = y? = 
z°> = 1, xyz = 1 has order 60. 
(b) Let H be the subgroup generated by x and zyz'. Determine the permutation repre- 
sentation of G on G/H, and identify H. 
(c) Prove that G is isomorphic to the alternating group As. 
(d) Let K be the subgroup of G generated by x and yxz. Determine the permutation rep- 
resentation of G on G/K, and identify K 
Miscellaneous Problems 
1. (a) Prove that the subgroup 7’ of O; of all symmetries of a regular tetrahedron, includ- 


ing orientation-reversing symmetries, has order 24. 
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(b) Is T’ isomorphic to the symmetric group S4? 
(c) State and prove analogous results for the group of symmetries of a dodecahedron. 
2. (a) Let U = {1, x} be a subset of order 2 of a group G. Consider the graph having one 
vertex for each element of G and an edge joining the vertices g to gx for all g © G. 
Prove that the vertices connected to the vertex | are the elements of the cyclic group 
generated by x. 
(b) Do the analogous thing for the set U = {1, x, y}. 


*3. (a) Suppose that a group G operates transitively on a set S, and that H is the stabilizer of 
an element so € S. Consider the action of G on SXS defined by g(s,,52) = 
(g5,, g5.). Establish a bijective correspondence between double cosets of H in G and 
G-orbits in S Xx S. 
(b) Work out the correspondence explicitly for the case that G is the dihedral group Ds 
and S is the set of vertices of a 5-gon. 
(c) Work it out for the case that G = T and that S is the set of edges of a tetrahedron. 

*4. Assume that H C K C G are subgroups, that H is normal in K, and that K is normal in 
G. Prove or disprove: H is normai in G. 

*5, Prove the Bruhat decomposition, which asserts that GL,(fR) is the union of the double 
cosets BPB, where B is the group of upper triangular matrices and P is a permutation 
matrix. 

6. (a) Prove that the group generated by x, y with relations x’, y? is an infinite group in two 

ways: 

(i) It is clear that every word can be reduced by using these relations to the form 
‘*+ xyxy «++. Prove that every element of G is represented by exactly one such 
word, 

(ii) Exhibit G as the group generated by reflections r,r’ about lines €, €’ whose 
angle of intersection is not a rational multiple of 277. 

(b) Let N be any proper normal subgroup of G. Prove that G/N is a dihedral group. 


7. Let H, N be subgroups of a group G, and assume that N is a normal subgroup. 
(a) Determine the kernels of the restrictions of the canonical homomorphism 
a: G——>G/N to the subgroups H and HN. 
(b) Apply the First Isomorphism Theorem to these restrictions to prove the 
Second Isomorphism Theorem: H/(H ( N) is isomorphic to (HN)/N. 
8. Let H,N be normal subgroups of a group G such that H DN, and let H = H/N, 
G =G/N. 
(a) Prove that H is a normal subgroup of G. 
(b) Use the composed homomorphism G-——> G——> G/F to prove the 
Third Isomorphism Theorem: G/H is isomorphic to G/H. 


Chapter 7 


Bilinear Forms 


I presume that to the uninitiated 
the formulae will appear cold and cheerless. 


Benjamin Pierce 


I, DEFINITION OF BILINEAR FORM 


Our model for bilinear forms is the dot product _ 
(1.1) (x: Y) = X Vis My kan 


of vectors in R”, which was described in Section 5 of Chapter 4. The symbol (Xx - Y) 
has various properties, the most important for us being the following: 


(1.2) Bilinearity: (X, + X2°Y) = (XY) + (X2-Y) 
(X-¥,; + Y.) = (X-Y,) + &-¥2) 
(cX:Y) = c(X-Y) = (X<cyY) 
Symmetry: (x: Y) = (Y-X) 
Positivity: (xX-e 0, ite #0, 


Notice that bilinearity says this: If one variable is fixed, the resulting function of the 
remaining variable is a linear transformation R’—— R. 

We will study dot product and its analogues in this chapter. It is clear how to 
generalize bilinearity and symmetry to a vector space over any field, while positivity 
is, a priori, applicable only when the scalar field is R. We will also extend the con- 
cept of positivity to complex vector spaces in Section 4. 
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Let V be a vector space over a field F. A bilinear form on V is a function of two 


variables on V, with values in the field: V x V tg F, satisfying the bilinear axioms, 
which are 


(1.3) f(vi + v2,w) = f(ur,w) + flv, w) 
f(cv, w) = cf(v, w) 

f(v, wi + wo) = flv, wi) + flv, we) 
f(v, cw) = cf(v,w) 


for all v,w,ti,w © V and all c € F. Often a notation similar to dot product is 
used. We will frequently use the notation 


(1.4) (vo, w) 


to designate the value f(u, v) of the form. So (v, w) is a scalar, an element of F. 
A form (,) is said to be symmetric if aaa 


ame 


Seer eet 


(3) (v,w) = (w,v) 
and skew-symmetric if 
(1.6) (v,w) = —(w,v), 


for all vc, w © V. (This is actually not the right definition of skew-symmetry if the 
field F is of characteristic 2, that is, if 1 + 1 = Oin F. We will correct the definition 
in Section 8.) 

If the form f is either symmetric or skew-symmetric, then linearity in the sec- 
ond variable follows from linearity in the first. 

The main examples of bilinear forms are the forms on the space F” of column 
vectors, obtained as follows: Let A be an » X n matrix in F, and define 


ee) (X,Y) = X‘AY. 


Note that this product is a | X | matrix, that is, a scalar, and that it is bilinear. Ordi- 


‘nary dot product is included as the case A = [7 ~~ {* 'y) 


A matrix A is symmetric if 


eal 


(1.8) A' =A, that is, aj = a; for all i,j. 


(1.9) Proposition. The form (1.7) is symmetric if and only if the matrix A is sym- 
metric. 


Proof. Assume that A is symmetric. Since Y'AX is a 1 X | matrix, it is equal to 
its transpose: Y'AX = (Y'AX)' = x'aty = X'AY. Thus (Y,X) = (X,Y). The other im- 
plication is obtained by setting X = e; and Y = e. We find (e;, e)) = eAe; = ai, 
while (e;, e;) = aj. If the form is symmetric, then ay = aj, and so A is symmetric. b 


Let (,) be a bilinear form on a vector space V, and let B = (v),..., Dn) be a ba- 
sis for V. We can relate the form to a product x‘AY by the matrix of he sai with 


ay, fp 
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respect to the basis. By definition, this is the matrix A = (aj), where 
(1.10) ay = (vi, vj). 


Note that A is a symmetric matrix if and only if (,) is a symmetric form, Also. the 
symmetry of the bilinear form does not depend on the basis. So if the matrix of the 
form with respect to some basis is symmetric, its matrix with respect to any other 
basis will be symmetric too. 

The matrix A allows us to compute the value of the form on two vectors 
v,w © V. Let X.Y be their coordinate vectors, as in Section 4 of Chapter 3, so that 


at) = = BY. Then 


Se. 
ce, 


(v,w) = (> vixi, >) vy). 
t J 
This expands using bilinearity to >) xiyj(v, 0) = >; xiayy; = X*AY: 
my i,j 


(let) (v,w) = X'AY. 


Thus, if we identify F” with V using the basis B as in Chapter 3 (4.14), the bilinear 
form (,) corresponds to X'‘AY. 
As in the study of linear operators, a central problem is to describe the effect 

of a change of basis on such a product. For example, we would like to know what 
presently. The effect of a 1 change of basis B = B'/P [Chapter 3 (4.16)] on the matrix 
of the form can be determined easily from the rules X’ = PX, Y’ = PY: If A’ is the 
matrix of the form with respect to a new basis B’, then by definition of 4’, 
(v,w) = X"A’y’ = x'P'A'Py. But we also have (v, w) = X‘AY. So 


Cit) F P'A'P = A. 


Let Q = (P"')'. Since P can be any invertible matrix, Q is also arbitrary. 


(1.13) Corollary. Let A be the matrix of a bilinear form with respect to a basis.c2, 

The matrices A’ which represent the same form with respect to different bases are the 

matrices A’ = QAQ', where Q is an arbitrary matrix in GL,(F). o q’ 

—— (mont Ho ) 
Let us now apply formula (1.12) to our original example of dot product on R”. 

The matrix of the dot product with respect to the standard basis is the identity ma-__ 
trix: (X - Y) = x'1y, So formula (1.12) tells us that if we change basis, the matrix of 
“the form changes to — 


(1.14) A’ = (P')S'7(P') = O'R"), 


where P is the matrix of change of basis as before. If the matrix P happens to be or- 
thogonal, meaning that P'P = /, then A’ = /, and dot product carries over to dot 
product: (Xx - Y) = (PX - PY) = (X’ - Y’), as we saw in Chapter 4 (5.13). But under a 
general change of basis, the formula for dot product changes to X"'A'y’, where A’ is 
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as in (1.14). For example, let n = 2, and let the basis B’ be 


_ fo 
7) ai and ts =| 


SENe aly 
Then y. y! (p') [p 


> 
ral ee ae laa Octal 2 
(1.15) P =| and A =|: dh al ' 


The matrix A’ represents dot product on R’*, with respect to the basis B’. 

We can also turn the computation around. Suppose that we are given a bilinear 
form (.) on a real vector space V. Let us ask whether or not this form becomes dot 
product when we choose a suitable basis. We start with an arbitrary basis B, so that 
we have a matrix A to work with. Then the problem ts to change this basis in such a 
way that the new matrix is the identity, if that is possible. By formula (1.12), this 
amounts to solving the matrix equation / = (P°')'A(P~'), or 


(1.16) A = P'P. 


(1.17) Corollary. The matrices A which represent a form equivalent to dot 
product are the matrices A = P'P, where P is invertible. 5 


This corollary gives a theoretical answer to our problem of determining the bi- 
linear forms equivalent to dot product, but it is not very satisfactory because we 
don’t yet have a practical method of deciding which matrices can be written as a 
product P'P, let alone a practical method of finding P. 

We can get some conditions on the matrix A from the properties of dot product 
listed in (1.2). Bilinearity imposes no condition on A, because the symbol X X'aY is al- 
ways bilinear. However, symmetry and positivity restrict the possibilities. The easier 
property to check is symmetry: In order to represent dot product, the matrix A must 
be symmetric. Positivity is also a strong restriction..In order to represent dot 
product, the matrix A must have the property that 


(1.18) (scan) X'ax > 0, for all x # 0. X > vector 


A real symmetric matrix having this property is called positive definite. 


At 19) Theorem. The following properties of a real n X n matrix A are equivalent: 


(i) A represents dot product, with respect to some basis of R”. 
(ii) There is an invertible matrix P € GL,(R) such that A = P'P. 
(iii) A is symmetrjc and positive definite. 


We have seen that (1) and (i1) are equivalent [Corollary (1.17)] and that (i) implies 
(ili). So it remains to prove the remaining implication, that (iii) implies (i). It will be 
more convenient to restate this implication in vector space form. 
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A symmetric bilinear form (,) on a finite-dimensional real vector space V is 
called positive definite if 


(1.20) (v,v) > 0 


for every nonzero vector v € V. Thus a real symmetric matrix A is positive definite 
if and only if the form (xX, Y) = X‘AY it defines on R” is a positive definite form. 
Also, the form (,) is positive definite if and only if its matrix A with respect to any 
basis is a positive definite matrix. This is clear, because if X is the coordinate vector 
of a vector v, then (v, v) = X'AX (1.11). 

Two vectors v, w are called orthogonal with respect to a symmetric form ii 
(v,w) = 0. Orthogonality of two vectors is often denoted as 


(1.28) o lw. 


This definition extends the concept of orthogonality which we have already seen 
when the form is dot product on R” [Chapter 4 (5.12)]. A basis B = (v1,..., Un) of V 
is called an orthonormal basis with respect to the form if 


— 


{v;,vj)) = 0 foralli # j, and (v;,v;) = 1 for all i. 


It follows directly from the definition that a basis B is orthonormal if and only if the 
matrix of the form with respect to B is the identity matrix. ( ie ) 


t4 


t 


(1.22) Theorem. Let (, ) be a positive definite symmetric form on a finite-dimen- 
sional vector real space V. There exists an orthonormal basis for V. 


Proof. We will describe a method called the Gram-—Schmidt procedure for 


constructing an orthonormal basis, starting with an arbitrary basis B = (v),..., vn). 
Our first step is to normalize v;, so that (v;, v1) = 1. To do this we note that 
(1.23) (cv, cv) = c?v. 


Since the form is positive definite, (v;, v1) > 0. We set c = (v1, v1)2, and replace 
v, by mw = cv). 

Next we look for a linear combination of w, and v2 which is orthogonal to wi. 
The required linear combination is w = v. — aw,, where a = (v2,w)) : (w,m) = 
(v2, Wi) — a(wi, wi) = (v2,wi) — a = 0. We normalize this vector w to length 1, 
obtaining a vector w2 which we substitute for v2. The geometric interpretation of this 
operation is illustrated below for the case that the form is dot product. The vector 
aw, is the orthogonal projection of v2 onto the subspace (the line) spanned by w,. 


v2 
w 


W2 


Wi aw 


py 


6B ee 
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This is the general procedure. Suppose that the k — | vectors w,,..., we 1 are 
orthonormal and that (w,,..., Wx—-1,Uk,.--, Un) 1S a basis. We adjust vx as follows: We 
let aj = (vx, wi) and 


(1.24) W = Uk — GiWi — G2W2 — *** — Ak-1Wk-1. 
Then w is orthogonal to w; for i = 1,...,k — 1, because 
(w, wi) = (ox, wi) — ai(wi, wi) — ar{wr, wi) — °° — Ge—1{Wk-1, Wi). 


Since w,,..., We-1 are orthonormal, all the terms (w;, w;)), 1 S j =k — 1, are zero 
except for the term (w;, wi), which is 1. So the sum reduces to 


(w, wi) = (vg, Wi) — ai(wi, Wi) = (ve, Wi) — ai = 0. 


We normalize the length of w to 1, obtaining a vector wx which we substitute for vx 
as before. Then (w1,..., wx) is orthonormal. Since vx is in the span of (wi,..., We; 
Uk+15+-.) Un), this set is a basis. The existence of an orthonormal basis follows by in- 
duction on k. 5 


End of the proof of Theorem (1.19). The fact that part (iii) of Theorem (1.19) im- 
plies (i) follows from Theorem (1.22). For if A is symmetric and positive definite, 
then the form (X,Y) = X'AY it defines on R” is also symmetric and positive definite. 
In that case, Theorem (1.22) tells us that there is a basis B’ of R” which is orthonor- 
mal with respect to the form (xX, Y) = X'AY. (But the basis will probably not be or- 
thonormal with respect to the usual dot product on R”.) Now on the one hand, the 
matrix A’ of the form (X,Y) with respect to the new basis B’ satisfies the relation 
P'A’P = A (1.12), and on the other hand, since B’ is orthonormal, A’ = /. Thus 
A = P'P. This proves (ii), and since (i) and (ii) are already known to be equivalent, 
it also proves (i). o 


Unfortunately, there is no really simple way to show that a matrix is positive 
definite. One of the most convenient criteria is the following: Denote the upper left 
i X i submatrix of A by A;. Thus 


Q\14)28)3 
_ _— | @ia@i2 Zs : 
A, = (an), 42 = » Az = | @21@22023 |,...,An = A. 
Q21a22 
Q31 A32433 


(1.25) Theorem. A real symmetric n X n matrix A is positive definite if and only 
if the determinant det A; is positive for each i = I,...,n. 


For example, the 2 X 2 matrix 


(1.26) A= \¢ ‘| 


is positive definite if and only if a > 0 and ad — bc > 0. Using this criterion, we 
can check immediately that the matrix A’ of (1.15) is positive definite, which agrees 
with the fact that it represents dot product. 

The proof of Theorem (1.25) is at the end of the next section. 
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2. SYMMETRIC FORMS: ORTHOGONALITY 


In this section, we consider a finite-dimensional real vector space V on which a sym- 
metric bilinear form (,) is given, but we drop the assumption made in the last sec- 
tion that the form is positive definite. A form such that (v, v) takes on both positive 
and negative values is called indefinite. The Lorentz form 


X'AY = xy, + x2yo + X3y3 — C7 Xaya 


of physics is a typical example of an indefinite form on ‘‘space-time” R*. The 
coefficient c representing the speed of light can be normalized to |, and then the ma- 
trix of the form with respect to the given basis becomes 


l 
a1) 
=| 


We now pose the problem of describing all symmetric forms on a finite-dimen- 
sional real vector space. The basic notion used in the study of such a form is still that 
of orthogonality. But if a form is not positive definite, it may happen that a nonzero 
vector v is self-orthogonal: (v,v) = 0. For example, this is true for the vector 
(1,0, 0, 1)' € R*, when the form is defined by (2.1). So we must revise our geomet- 
ric intuition. It turns out that there is no need to worry about this point. There are 
enough vectors which are not self-orthogonal to serve our purposes. 


(2.2) Proposition. Suppose the symmetric form (, ) is not identically zero. Then 
there is a vector v € V which is not self-orthogonal: (v, v) # 0. 


Proof. To say that {,) is not identically zero means that there is a pair of vec- 
tors v,w © V such that (v,w) #0. Take these vectors. If (v,v) #0, or if 
(w,w) #0, then the proposition is verified. Suppose (v,v) = (w,w) = 0. Let 
u = v + w, and expand (u, u) using bilinearity: 

(u,u) = (v + w,v + w) = (vo, v) + (v, w) + (wiv) + (w,w) = 0 + 20, w) + 0. 


Since (v, w) # 0, it follows that (u,u) # 0.0 

If W is a subspace of V, then we will denote by W~* the set of all vectors v 
which are orthogonal to every w € W: aa 
(2.3) Wt = {v EV | (vo, W) = Of. 


This is a subspace of V, called the orthogonal complement to W. 


(2.4) Proposition. Let w € V be a vector such that (w,w) # 0. Let W = {cw} 
be the span of w. Then V is the direct sum of W and its orthogonal complement: 
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Proof. According to Chapter 3 (6.4, 6.5), we have to show two things: 


(a) WM W+ = 0. This is clear. The vector cw is not orthogonal to w unless 

c = 0, because (cw, w) = c(w,w) and (w,w) # 0. 

(b) W and W+ span V: Every vector t © V can be written in the form 

v=aw+v', where v’€ Wt. To show this, we solve the equation 

(o — aw,w) = 0 for a: (v — aw,w) = (v,w) — a(w,w) = 0. The solution is 
(v,W) 


= . We set v' = v — awio 
(w,w) 


Two more concepts which we will need are the null space of a symmetric form 
and nondegenerate form. A vector v € V is called a null vector for the given form if 
(v,w) = 0 for all w € V, that is, if v is orthogonal to the whole space V. The null 
space of the form is the set of all null vectors 


(2.5) N = {v| (o,V) = 0} = VE. 


A symmetric form is said to be nondegenerate if the null space is {0}. 


(2.6) Proposition. Let A be the matrix of a symmetric form with respect to a 
basis. 


(a) The null space of the form is the set of vectors v such that the coordinate vec- 
tor X of v is a solution of the homogeneous equation AX = 0. 


(b) The form is nondegenerate if and only if the matrix A is nonsingular. 


Proof. Via the basis, the form corresponds to the product X'AY {see (1.11)]. 
We might as well work with this product. If Y is a vector such that AY = 0, then 
X'AY = 0 for all X; hence Y is in the null space. Conversely, suppose that AY # 0. 
Then AY has at least one nonzero coordinate. The ith coordinate of AY is e'AY. So 
one of the products e/AY is not zero. This shows that Y is not a null vector, which 
proves (a). Part (b) of the proposition follows from (a). o 


Here is a generalized version of (2.4): 
(2.7) Proposition. Let W be a subspace of V, and consider the restriction of a 
symmetric form (,) to W. Suppose that this form is nondegenerate on W. Then 
V=Wwow-. 
We omit the proof, which closely follows that of (2.4). o 


(2.8) Definition. An orthogonal basis B = (v,,...,Un) for V, with respect to a 
symmetric form (,), is a basis such that v; 1 v; for all i # j. 


Since the matrix A of a form is defined by a = (v;, vj), the basis B is orthogo- 
nal if and only if A is a diagonal matrix. Note that if the symmetric form (, ) is non- 


Section 2 Symmetric Forms: Orthogonality 245 


degenerate and the basis B = (t),..., Un) is orthogonal, then (v;, v;) # O for all i: the 
diagonal entries of A are nonzero. 


(2.9) Theorem. Let (,) be a symmetric form on a real vector space V. 


(a) There is an orthogonal basis for V. More precisely, there exists a basis 
B = (v,..., Un) such that (v;, vj) = 0 for i # j and such that for each i, (v;, vi) 
is either 1, —1, or 0. 
(b) Matrix form: Let A be a real symmetric n Xn matrix. There is a matrix 
“O ©GL,AR) such that QAg' is a diagonal matrix each of whose diagonal en- 
tries is 1,-l,or0. — (gh Kew Ppechal 4 Ham) 


Part (0) of the theorem follows from (a), and (1.13), taking into account the fact 
that any symmetric matrix A is the matrix of a symmetric form. o 


We can permute an orthogonal basis B so that the indices with (v;, v;) = 1 are 
the first ones, and so on. Then the matrix A of the form will be 


Ip 
(2.10) A= —Im ; 
0; 
where p is the number of +1’s, m is the number of —1’s, and z is the number of 0’s, 
so that p + m+ z =n. 1. These numbers are uniquely determined by the form or by 
the matrix A: — 


(2.11) Theorem. Sylvester's Law: The numbers p,m,z appearing in (2.10) are 
uniquely determined by the form. In other words, they do not depend on the choice 
of orthogonal basis B such that (v;, vj) = +1 or 0. 


The pair of integers (p, m) is called the signature of the form. 


Proof of Theorem (2.9). If the form is identically zero, then the matrix A, computed 
with respect to any basis, will be the zero matrix, which is diagonal. Suppose the 
form is not identically zero. Then by Proposition (2.2), there is a vector v = v, with 
(v,,v1) # 0. Let W be the span of v,. By Proposition (2.4), V = W®W*+, and soa 
basis for V is obtained by combining the basis (v,) of W with any basis (v2,..., Un) of 
W+ [Chapter 3 (6.6)]. The form on V can be restricted to the subspace W*, and it 
defines a form there. We use induction on the dimension to conclude that W* has an 
orthogonal basis (v2,..., Un). Then (v), v2,..., Un) is an orthogonal basis for V. For, 
(v,,v;)) = 0 if i > 1 because v; € W*, and (vi, v;)) = 0 if i,j > 1 and i + j, 
because (v2,..., Un) is an orthogonal basis. 

It remains to normalize the orthogonal basis just constructed. If (uy, vi) # 0, 
we solve c-? = +(v;,v;) and change the basis vector vu; to cv;. Then (vj, vi) is 
changed to +1. This completes the proof of (2.9.) o 
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Proof of Theorem (2.11). Let r = p + m. (This is the rank of the matrix A.) Let 
(v,,..., Un) be an orthogonal basis of V of the type under consideration, that is, so 
that the matrix is (2.10). We will first show that the number z is determined by prov- 
ing that the vectors v;+1,..., Un form a basis for the null space N = V*. This will 
show that z = dim N, hence that z does not depend on the choice of a basis. 

A vector w € V is a null vector if and only if it is orthogonal to every ele- 
ment v; of our basis. We write our vector as a linear combination of the basis: w = 
Civ, + +++ + Cnn. Then since (v;,v;) = 0 if i # j, we find (w, v) = cilvi, vi). 
Now (v;, vi) = 0 if and only if i > r. So in order for w to be orthogonal to every ui, 
we must have c; = 0 for all i < r. This shows that (v-+1,..., Un) spans NV, and, being 
a linearly independent set, it is a basis for N. 

The equation p + m + z = n proves that p + m is also determined. We still 
have to show that one of the two remaining integers p,m is determined. This is not 
quite so simple. It is not true that the span of (v,,..., vp), for instance, is uniquely 
determined by the form. 

Suppose a second such basis (v;’,..., Un’) is given and leads to integers p’, m' 
(with z’ = z). We will show that the p + (n — p') vectors 


ye 


(12) 01; Op 1 yeas Oh 


are linearly independent. Then since V has dimension n, it will follow that 
p+ (n— p') =n, hence that p = p’, and, interchanging the roles of p and p’, 
that p = p’. 

Let a linear relation between the vectors (2.12) be given. We may write it in 
the form : 
(2213) bio, + +++ + Bptp = Cp'+ivp'41 + °°° + Catn’. 


Let v denote the vector defined by either of these two expressions. We compute 
(v, 0) in two ways. The left-hand side gives 


(v, 0) = by?(v1, 01) + s* + By{vp, tp) = b? + --- +b, = 0, 
while the right-hand side gives 
(v, 0) = Cp'+i7(Up'41, Up'ti) + 22+ + Cn?(0n', On) = = Opie Cree = 0: 


It follows that b,? + --- + bp? = 0, hence that b, = ... = bp = 0. Once this is 
“known, the fact that (v,’,...,vn’) is a basis combines with (2.13) to imply 
Cp'+1 = ... = Cn = O. Therefore the relation was trivial, as required. o 


For dealing with indefinite forms, the notation /p,m is often used to denote the 
diagonal matrix 


| ae 
(2.14) lam = | af 


With this notation, the matrix representing the Lorentz form (2.1) is J5,. 


We will now prove Theorem (1.25)—that a matrix A is positive definite if and 
only if det A; > 0 for all i. 
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Proof of Theorem (1.25). Suppose that the form xX'AY is positive definite. A 
change of basis in [R” changes the matrix to A’ = QAQ', and 


det A’ = (det Q )(det A)(det Q') = (det Q)?(det A). 


Since they differ by a square factor, det A’ is positive if and only if det A is 
positive. By (1.19), we can choose a matrix Q so that A‘ = /, and since / has deter- 
minant |. det A > 0. 

The matrix A, represents the restriction of the form to the subspace V; spanned 
boyy... . v,), and of course the form is positive definite on V;. Therefore det A; > 0 
for the same reason that det A > 0. 

Conversely. suppose that det 4; 1s positive for all 7. By induction on n, we may 


assume the form to be positive definite on V,-;. Therefore there is a matrix 
Q’ & GEy-, such'that Q'A,-.Q"" = In—1. Let @ be the matrix 
aaiO 
‘ iF 
Then 
* 
a= \ * 
* * 


We now clear out the bottom row of this matrix, except for the (n,n) entry, by ele- 
mentary row operations F;,...,En—1. Let P = En—1°*'E,Q. Then 


0 
A’ = PAP' = alee 


Oe aa 
for some c. The last column has also been cleared out because A’ = PAP' is symmet- 
ric. Since det A > 0, we have det A’ = (det A)(det P)? > 0 too, and this implies 
that c > 0. Therefore the matrix A’ represents a positive definite form. It also repre- 
sents the same form as A does. So A is positive definite. o 


3. THE GEOMETRY ASSOCIATED TO A POSITIVE FORM 


In this section we return to look once more at a positive definite bilinear form (, ) on 
an n-dimensional real vector space V. A real vector space together with such a form 


is often called a Euclidean space. 


It is natural to define the length of a vector v by the rule 
(3.1) a V0.0), 


in analogy with the length of vectors in R” (Chapter 4 (5.10)]. One important con- 
sequence of the fact that the form is positive definite is that we can decide whether a 
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vector t is zero by computing its length: 
G2) v=0 ifandonlyif (v,v) = 0. 


As was shown in Section 1, there is an orthonormal basis B = (v;,..., Un) for 
V, and thereby the form corresponds to dot product on R"”: 


(vu,w) = X’y, 


if v = BX and w = BY. Using this correspondence, we can transfer the geometry of 
IX” over to V. Whenever a problem is presented to us on a Euclidean space V, a natu- 
ral procedure will be to choose a convenient orthonormal basis, thereby reducing 
the problem to the familiar case of dot product on R”. 

When a subspace W of V is given to us, there are two operations we can make. 
The first is to restrict the form (,) to the subspace, simply by defining the value of 
the form on a pair w:, w2 of vectors in W to be (w;, w2). The restriction of a bilinear 
form to a subspace W is a bilinear form on W, and if the form is symmetric or if it is 
symmetric and positive definite, then so is the restriction. 

Restriction of the form can be used to define the unoriented angle between two 
vectors tu, w. If the vectors are linearly dependent, the angle is zero. Otherwise, 
(v, w) is a basis of a two-dimensional subspace W of V. The restriction of the form 
to W is still positive definite, and therefore there is an orthonormal basis (w;, w2) for 
W. By means of this basis, v, w correspond to their coordinate vectors X,Y in R’. 
This allows us to interpret geometric properties of the vectors v, w in terms of prop- 
erties of X,Y. : 

Since the basis (w,, w2) is orthonormal, the form corresponds to dot product 
on R?: (v, w) = X'y. Therefore 


Jol =|x], [wl =|v], and (v,w) = (x - ¥). 


We define the angle 6 between v and w to be the angle between X and Y, and thereby 
obtain the formula 


(G33) (v,w) = |v||w| cos 8, 


as a consequence of the analogous formula [Chapter 4 (5.11)] for dot product in R?. 
This formula determines cos @ in terms of the other symbols, and cos 6 determines @ 
up to a factor of +1. Therefore the angle between v and w is determined up to sign 
by, the form alone. This is the best that can be done, even in R?. 

Standard facts such as the Schwarz Inequality 


(3.4) |(v, w)| = [vl] | 
and the Triangle Inequality 
(3.5) lo+w|s|o| + |w| 


can also be proved for arbitrary Euclidean spaces by restriction to a two-dimensional 
subspace. 
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The second operation we can make when a subspace W is given is to project V 
onto W. Since the restriction of the form to W is positive definite, it is nondegener- 
ate. Therefore V = W@W * by (2.17), and so every v € V has a unique expression 


(3.6) - v=wtw’, with weEw and (w,w’) =0. 
The orthogonal projection 7: V——> W is defined to be the linear transformation 
(3.7) pew r(v) = w 


where w is as in (3.6). 


The projected vector 7 (v) can be computed easily in terms of an orthonormal 
basis (w:,..., wr) of W. What follows is important: 


(3.8) Proposition. Let (w1,..., w-) be an orthonormal basis, of a subspace W, and 
let v € V. The orthogonal projection 7 (v) of v onto W is the vector 


a(v) = (v,wi)w, + + + (wv, wr. 


Thus if 7 is defined by the above formula, then v — 7r(v) is orthogonal to W. This 
formula explains the geometric meaning of the Gram—Schmidt procedure described 
in Section 1. 


Proof. Let us denote the right side of the above equation by Ww. Then (w, wi) = 
(v, wiXw., Wi) = (v, wi) for i = 1,...,r, hence v — Ww € W-. Since the expression 
(3.6) for v is unique. w = Ww andw’ = 0 — W.o 


The case W = V is also important. In this case, 7 is the identity map. 


— 


(3.9) Corollary. Let B = (uv,...,v,) be an orthonormal basis for a Euclidean 
space V. Then 6 ( iS ] 


v = (v, 0,0) + + (v, Un)Un. 
In other words, the coordinate vector of v with respect to the orthonormal basis B is 


X = ((v, v)),..., (0, On)". 0 


4. HERMITIAN FORMS 


In this section we assume that our scalar field is the field C of complex numbers. 
When working with complex vector spaces, it is desirable to have an analogue of the 
concept of the length of a vector, and of course one can define length on C” by iden- 
tifying it with R*”. If X = (x1,...,xn)' is a complex vector and if x, = a, + b,i, then 
the length of X is 


(4.1) |X| = Vai? + By? + «++ + ag? + Be = Vax + 0+ + Xndn, 


where the bar denotes complex conjugation. This formula suggests that dot product 
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is “wrong” for complex vectors and that we should define a product by the formula 
(4.2) (X, Y) = Miy = x1) + caren ar Xn Yn. 


This product has the positivity property: 
(4.3) (x,X) is a positive real number if X # 0. 


Moreover, (4.2) agrees with dot product for real Vectors. 
The product (4.2) is called the standard hermitian product, or the hermitian 
dot product. It has these properties: 


(4.4) 
Linearity in the second variable: 
CCC =X) anid Xk, rs) — (XX 
Conjugate linearity in the first variable: 
(xa = CR ean aX) + XY) = OI + Oo: 


Hermitian symmetry: 


(y,x) = &,Y). 


So we can have a positive definite product at a small cost in linearity and symmetry. 

When one wants to work with notions involving length, the hermitian product 
is the right one, though symmetric bilinear forms on complex vector spaces also 
come up in applications. 

If V is a complex vector space, a hermitian form on V is any function of two 
variables. 

VxVv— _ C 

4.5 
4,5) v,Ww ws (v,w) | 


satisfying the relations (4.4). Let B = (v1,...,0n) be a basis for V. Then the matrix 
of the form is defined in the analogous way as the matrix of a bilinear form: 


A= (ai), where Cp = (vj, vj). 
The formula for the form now becomes 
(4.6) (pw) Sex ‘AW, 


if v = BX and w = BY. 
The matrix A is not arbitrary, because hermitian symmetry implies that 


ay = (vj, Oj) a (vj, vi) = Gj, 


that is, that A = A‘. Let us introduce the adjoint of a matrix A [different from the 
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one defined in Chapter | (5.4)] as 
(4.7) A* =A". 
It satisfies the following rules: 
(Ages) — At + B* 


(AB)* = B*A* 
n> =n 
A** = A, 


These rules are easy to check. Formula (4.6) can now be rewritten as 
(4.8) (v, w) = X*AY, 


and the standard hermitian product on C” becomes (X,Y) = X*Y. 
A matrix A is called hermitian or self-adjoint if 
(4.9) A = A*, 


and it is the hermitian matrices which are matrices of hermitian forms. Their entries 
satisfy aj; = aj. This implies that the diagonal entries are real and that the entries 
below the diagonal are complex conjugates of those above it: 


A= ° Lge hk way eC. 
aij rn 


For example, [: al is a hermitian matrix. 


Note that the condition for a real matrix to be hermitian is aj = aj: 
(4.10) The real hermitian matrices are the real symmetric matrices. 


The discussion of change of basis in Sections | and 2 has analogues for hermi- 
tian forms. Given a hermitian form, a change of basis by a matrix P leads as in 
(1.12) to : 

X'*A'y' = (PX)*A'PY = X*(P*A'P)Y. 
Hence the new matrix A’ satisfies 
(4.11) A= P*A’P or A’ = (P*) ‘AP, 


Since P is arbitrary, we can replace it by Q = (P*)”' to obtain the description 
analogous to (1.13): 


(4.12) Corollary. Let A be the matrix of a hermitian form with respect to a basis. 
The matrices which represent the same hermitian form with respect to different 
bases are those of the form A’ = QAQ*, for some invertible matrix @ © GL,(C). o 
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For hermitian forms, the analogues of. orthogonal matrices are the unitary ma- 
trices. A matrix P is called unitary if it satisfies the condition 


(4.13) P*P=1 or P*¥ =P". 


For example = : : is a unitary matrix. 
85) fer 


Note that for a real mafrix P, this condition becomes P'P = /: 
(4.14) The reat unitary matrices are the real orthogonal matrices. 


The unitary matrices form a group, the unitary group Up: 
(4.15) Un = {P | P*P = J}. 


Formula (4.11) tells us that unitary matrices represent changes of basis which leave 
the standard hermitian product X*Y invariant: 


(4.16) Corollary. A change of basis preserves the standard hermitian product, 
that is, X*Y = x’*y’, if and only if its matrix P is unitary. o 


But Corollary (4.12) tells us that a general change of basis changes the stan- 
dard hermitian product X*Y to X’*A'Y’, where A’ = QQ*, and Q € GL,(C). 

The notion of orthogonality for hermitian forms is defined exactly as for sym- 
metric bilinear forms: v is called orthogonal to w if (v,w) = 0. Since (v,w) = 
(w,v), orthogonality is still a symmetric relation. We can now copy the discussion 
of Sections 1 and 2 for hermitian forms without essential change, and Sylvester’s 
Law (2.11) for real symmetric forms carries over to the hermitian case. In particu- 
lar, we can speak of positive definite forms, those having the property that 


(4.17) (v, v) is a positive real number if v # 0, 


and of orthonormal bases B = (v1,..., Un), those such that 


(4.18) (vi,vi:) = 1 and (vj,0j)=0 if i # j. 


(4.19) Theorem. Let (, ) be a hermitian form on a complex vector space V. There 
is an orthonormal basis for V if and only if the form is positive definite. 


(4.20) Proposition. Let W be a subspace of a hermitian space V. If the restriction 
of the form to W is nondegenerate, then V = W@W 


The proofs of these facts are left as exercises. 5 
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In this section we will study an n-dimensional complex vector space V and a positive 
definite hermitian form (,.) on V. A complex vector space on which a positive 
definite hermitian form is given is often called a hermitian space. You can imagine 
that V is C”, with its standard hermitian product X*Y, if you want to. The choice of 
an orthonormal basis in V will allow such an identification. 

Since the form {, ) is given, we will not want to choose an arbitrary basis for V 
in order to make computations. It is natural to work exclusively with orthonormal 
bases. This changes all previous calculations in the following way: It will no longer 
be true that the matrix P of a change of basis is an arbitrary invertible matrix. 
Rather, if B = (t),..., tn), B’ = (v,',..., Un’) are two orthonormal bases, then the 
matrix P relating them will be unitary. The fact that the bases are orthonormal means 
that the matrix of the form (,) with respect to each basis is the identity /, and so 
(4.11) reads / = P*IP, or P*P = 1. 

We are going to study a linear operator 


(S71) T. V—>V 


on our space. Let B be an orthonormal basis, and let M be the associated matrix of T. 
A change of orthonormal basis changes M to M’ = PMP ' [Chapter 4 (3.4)] where P 
is unitary; hence 


(5.2) M' = PmMP*. 


(5.3) Proposition. Let T be a linear operator on a hermitian space V, and let M be 
the matrix of T with respect to an orthonormal basis B. 


(a) The matrix M is hermitian if and only if (vo, Tw) = (Tv, w) for all v, w € V. 
If so, T is called a hermitian operator. 

(b) The matrix M is unitary if and only if (v, w) = (Tv, Tw) for all v, w © V. 
If so, T is called a unitary operator. 


Proof. Let X,Y be the coordinate vectors of v, w: v = BX, w = BY, so that 
(v,w) = X*Y and Tv = BMX. Then (v, Tw) = X*MY, and (Tv, w) = X*M*Y. So if 
M = M*, then (v, Tw) = (Tv,w) for all v,w; that is, T is hermitian. Conversely, if 
T is hermitian, we set v =e, w =e; as in the proof of (1.9) to obtain 
by = ei*(Me;) = (e*M*)e; = bj. Thus M = M*. Similarly, (v,w) = xX*Y and 
(Tv, Twy= X*M*MY, so (v, w) = (Tv, Tw) for all v, w if and only if M*M = J. a 


4) Theorem. Spectral Theorem: 


(a) Let T be a hermitian operator on a hermitian vector space V. There is an or- 
thonormal basis of V consisting of eigenvectors of T. 

(b) Matrix form: Let M be a hermitian matrix. There is a unitary matrix P such that 
PMP* is a real diagonal matrix. 
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Proof. Choose an eigenvector t = v,, and normalize so that its length is 1: 
(v, v) = 1. Extend to an orthonormal basis. Then the matrix of T becomes 


Since T is hermitian, so is the matrix M (5.3). This implies that * --- * = 0---0 and 
that N is hermitian. Proceed by induction. 5 


To diagonalize a hermitian matrix M by a unitary P, one can proceed by deter- 
mining the eigenvectors. If the eigenvalues are distinct, the corresponding eigenvec- 
tors will be orthogonal. This follows from the Spectral Theorem. Let B’ be the 
orthonormal basis obtained by normalizing the lengths of the eigenvectors to 1. 
Then P = [B’]"' (Chapter 3 (4.20)]. 

For example, let 

———_ : 
— 2 1 
4 2, 


The eigenvalues of this matrix are 3, 1, and the vectors 


vi = oe | bab} v2 = . 
=U l 


are eigenvectors with these eigenvalues. We normalize their lengths to | by the fac- 


1 
tor —=. Then 
V2". 


At Teal] me me P 
PR 7G pcs ees and PMP* = i} 

But the Spectral Theorem asserts that a hermitian matrix can be diagonalized 
even if its eigenvalues aren't distinct. This statement becomes particularly simple for 
2 x 2 matrices: If the characteristic polynomial of a 2 x 2 hermitian matrix M has a 
double root, then there is a unitary matrix P such that PMP* = a/. Bringing the P’s 
over to the other side of the equation, we obtain M = P*alP = aP*P = al. So it 
follows from the Spectral Theorem that M = a/. The only 2 X 2 hermitian matrices 
whose characteristic polynomials have a double root are the matrices a/, where a is 
a real number. We can verify this fact directly from the definition. We write 


a | ° , where a, d are real and B is complex. Then the characteristic polyno- 
mial is ? — (a + d)t + (ad — BB). This polynomial has a double root if and only 
if its discriminant vanishes, that is, if 


(a + d) — 4(ad — BB) = (a — d)? + 4BB = 0. 
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Both of the terms (a — d)? and Bf are nonnegative real numbers. So if the discrimi- 
nant vanishes, then a = d and B = 0. In this case, M = al, as predicted. 

Here is an interesting consequence of the Spectral Theorem for which we can 
give a direct proof: 


(5.5) Proposition. The eigenvalues of a hermitian operator T are real numbers. 


Proof. Let a be an eigenvalue, and let v be an eigenvector for T such that 
T(t) = av. Then by (5.3) (Tv, v) = (v, Tv); hence (av, v) = (v, av). By conjugate 
linearity (4.4), 


a(v,v) = (av,v) = (v, av) = a(v, v), 
and (v,v) # 0 because the form (,) is positive definite. Hence a = @. This shows 


that a is real. o 


The results we have proved for hermitian matrices have analogues for real 
symmetric matrices. Let V be a real vector space with a positive definite bilinear 
form {,). Let T be a linear operator on V. 


(5.6) Proposition. Let be the matrix of T with respect to an orthonormal basis. 


(a) The matrix M is symmetric if and only if (v, Tw) = (Tv, w) for all o,w € V. 
If so, T is called a symmetric operator. 
(b) The matrix M is orthogonal if and only if (v, w) = (Tv, Tw) for all v,w € V. 


If so, T is called an orthogonal operator. o 
(5.7) Proposition. The eigenvalues of a real symmetric matrix are real. 


Proof. A real symmetric matrix is hermitian. So this is a special case of (5.5). 


(5.8) Theorem. Spectral Theorem (real case): 


(a) Let T be a symmetric operator on a real vector space V with a positive definite 
bilinear form. There is an orthonormal basis of eigenvectors of T. 

(b) Matrix form: Let M be a real symmetric n X n matrix. There is an orthogonal 
matrix P € O,(R) such that PMP' is diagonal. 


Proof. Now that we know that the eigenvalues of such an operator are real, we 
can copy the proof of (5.4). 5 


6. CONICS AND QUADRICS 


A conic is the locus in the plane R? defined by a quadratic equation in two variables, 
of the form 


(6.1) f (x1, %2) = anx?+22anximtanx + bixtbx, + ¢ = 0. 
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More precisely, the locus (6.1) is a conic, meaning an ellipse, a hyperbola, or a 
parabola, or else it is called degenerate. A degenerate conic can be a pair of lines, a 
single line, a point, or empty, depending on the particular equation. The term 
quadric is used to designate the analogous loci in three or more dimensions. 

The quadratic part of f(x; , x2) is called a quadratic form: 


(6.2) g(X1,%2) = aux? + 2ayxix. + 2X2. 


In general, a quadratic form in n variables x,,...,Xn iS a polynomial each of whose 
terms has degree 2 in the variables. 

It is convenient to express the form q(x), x2) in matrix notation. To do this, we 
introduce the symmetric matrix 


_ | 412 
ic i peak 


Then g(x;, x2) = X'AX, where X denotes the column vector (x;,x2)'. We also intro- 
duce the row vector B = (b;, b2). Then equation (6.1) can be written in matrix nota- 
tion as 


(6.4) MAX 4 BX 2c — 0. 


We put the coefficient 2 into formulas (6.1) and (6.2) in order to avoid some 
coefficients 5 in the matrix (6.3). An alternative way to write the quadratic form 
would be 


q(x1, x2) = aux? + ai2xix2. + ay2xX2xX1 + Ar2Xr?. 


We propose to describe the congruence classes of conics as geometric figures 
or, what is the same, their orbits under the action of the group M of rigid motions of 
the plane. A rigid motion will produce a change of variable in equation (6.1). 


(6.5) Theorem. Every nondegenerate conic is congruent to one of the following: 


(i) Ellipse: ax) + anx? —1=0, 
(11) Hyperbola: aX" — anx, — 1 = 0, 
(iii) Parabola: ax) — x = 0, where ay, a2 > 0. 


Proof. We simplify equation (6.1) in two steps, first applying an orthogonal 
transformation (a rotation or reflection) to diagonalize A and then applying a transla- 
tion to eliminate, as much as possible, the linear and constant terms BX + c. 

By the Spectral Theorem (5.8), there is an orthogonal matrix P such that PAP' 
is diagonal. We make the change of variable x’ = PX, or X = P'x’. Substitution into 
equation (6.4) yields 


(6.6) X"(PAP')X’ + (BP')x’ +c = 0. 


Hence there is an orthogonal change of variable such that the quadratic form be- 
comes diagonal, that is, the coefficient a;2 of x;x2 is zero. 
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Suppose that A is diagonal. Then f has the form 
f(%1, 42) = aux’ tarnx? + bixithxm +c = 0. 


We eliminate b; by completing the squares, making the substitution 


b; 
(6.7) xi = (x = fy. 
This substitution results in 
(6.8) f(x, %2) = aya"? + anx.’? +c’, 


where c’ is a number which can be determined if desired. This substitution corre- 
sponds to translation by the vector (b, /2a;;, b2/2a2.)', and we can make it provided 
ai1, 22 are not zero. 

If aij; = 0 but b; # 0, then we can use the substitution 


(6.9) oF x; a c/b; 


to eliminate the constant term instead. We may normalize one coefficient to —1. Do- 
ing so and eliminating degenerate conics leaves us with the three cases listed in the 
theorem. It is not difficult to show that a change of the coefficients a;,, a2. results in 
a different congruence class, except for the interchange of a);, a22 in the equation of 
an ellipse. o 


The method used above can be applied in any number of variables to classify 
quadrics in n dimensions. The general quadratic equation has the form 


(6.10) LG aS S aiix? a > 2aijXiXj + > dix; + C =a: 
i i<j i 
We could also write this equation more compactly as 
(6.11) flea we ene) D ayxixy + S bixi ere cal OS 
ij i 


where the first sum is over all pairs of indices, and where we set aj = aij. 
We define the matrices A,B to be 


Q@1Q@\2  *** Aim 
Qi2 : 

A= . . ; B= (Di.s-.5 Uae 
Aim Amm 


Then the quadratic form is 

(6.12) g(X1,...54n) = X'AX, 

and 

(6.13) tienes Xx) SAX + BX > ec. 
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By a suitable orthogonal transformation P, the quadric is carried to (6.6), where 
PAP' is diagonal. When A is diagonal, linear terms are eliminated by the translation 


(6.7), or.else (6.9) is used. 
Here is the classification in three variables: 


(6.14) Theorem. The congruence classes of nondegenerate quadrics in R? are rep- 
resented by 

(i) Ellipsoids: Qy1X1? +. A22X2" +.433x3"— 1 = 0, 

(ii) 1-sheeted hyperboloids: ay, x\?+axX2*—a33x3—1 = 0, 

(iii) 2-sheeted hyperboloids: a\,x\?—axXx—a33xx’—1 = 0, 

(iv) Elliptic paraboloids: 1X) +anx2’—-x; = 0, 

(v) Hyperbolic paraboloids: QX1°—AnX2"’—x3 = 0, 
where @i1, 422, @33 > 0.0 

If a quadratic equation f(x,, x2) = 0 is given, we can determine the type of 

conic it represents most easily by allowing nonorthogonal changes of coordinates. 
For example, if the associated quadratic form q is positive definite, then the conic is 
either an ellipse, or else it is degenerate (a single point or empty). To distinguish 
these cases, arbitrary changes of coordinates are permissible. A nonorthogonal co- 
ordinate change will distort the conic, but it will not change an ellipse into a hyper- 


bola or a degenerate conic. 
As an example, consider the locus 


(6.15) X- + Kot Xp” AF 4x,+3x2 ar 4 = (). 


1 ‘| 
ie , 
Ho 


which is positive definite by (1.25). We diagonalize A by the nonorthogonal substitu- 
tion X’ = PX, where 


1 
p=|' il pap = |! Ht Pt = (4,1), 
4 


The associated matrix is 


to obtain 
x1'2+3.x2'? + 4xy'+2' + 4 = 0. 
Completing the square yields 
: Mi teas 4, 


an ellipse. Thus (6.15) represents an ellipse too. On the other hand, if we change the 
constant term of (6.15) to.5, the locus becomes empty. 
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7. THE SPECTRAL THEOREM FOR NORMAL OPERATORS 


The Spectral Theorem (5.4) tells us that any hermitian matrix M can be transformed 
into a real diagonal matrix D by a unitary matrix P: D = PMP*. We now ask for the 
matrices M which can be transformed in the same way to a diagonal matrix D, but 
where we no longer require D to be real. It turns out that there is an elegant formal 
characterization of such matrices. 


(7.1) Definition. A matrix M is called normal if it commutes with its adjoint, that 
is, if MM* = M*M. 


(7.2) Lemma. If ™ is normal and P is unitary, then M’ = PMP* is also normal, 
and conversely. 


Proof. Assume that M is normal. Then M'M’* = PMP*(PMP*)* = PMM*p* = 
PM*MP* = (PMP*)*(PMP*) = M’*M’. So PMP* is normal. The converse follows by 
replacing P by P*. a 


This lemma allows us to define a normal operator T: V—— V on a hermitian 
space V to be a linear operator whose matrix M with respect to any orthonormal basis 
is a normal matrix. 


(7.3) Theorem. A complex matrix M is normal if and only if there is a unitary 
matrix P such that PMP* is diagonal. o 


The most important normal matrices, aside from hermitian ones, are unitary 
matrices: Since M* = M™' if M is unitary, MM* = M*M = 1, which shows that M is 
normal. 


(7.4) Corollary. Every conjugacy class in the unitary group contains a diagonal 
matrix. o 


Proof of Theorem (7.3). First,'any two diagonal] matrices commute, so a diag- 
onal matrix is normal: DD* = D*D. The lemma tells us that M is normal if 
PMP* = D. Conversely, suppose that M is normal. Choose an eigenvector v = v, of 
M, and normalize so that (v,v) = 1, as in the proof of (5.4). Extend {v,} to an or- 
thonormal basis. Then M will be changed to a matrix 


A142 ° * *Ain G0 = := 0 
0 12 

M, = PMP* = | - N , and M,* = pM*p* = |- Nn* 
0 Gin 


The upper left entry of Mi*M, is au@i, while the same entry of MiM.* is 
Qn@i1+12G12t***+Qin@in. Since M is normal, so is M,, that is, Mi*M.* = MiM,*. It 
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follows that ai2@)2++::+@indin = 0. Since aja; = 0, this shows that the entries ai; 
with j > 1 are zero and that 


We continue, working on N. o 


8 SKEW-SYMMETRIC FORMS 


The theory of skew-symmetric forms is independent of the field of scalars. One 
might expect trouble with fields of characteristic 2, in which 1 + 1 = 0. They look 
peculiar because a = ~—a for all a, so the conditions for symmetry (1.5) and for 
skew symmetry (1.6) are the same. It turns out that fields of characteristic 2 don’t 
cause trouble with skew-symmetric forms, if the definition of skew symmetry is 
changed to handle them. The definition which works for all fields is this: 


(8.1) Definition. A bilinear form (,) on a vector space V is skew-symmetric if 


(v,v) =0 
for all v € V. 
The rule 
(8.2) (v,w) = —(w,v) 


for all v, w © V continues to hold with this definition. It is proved by expanding 
(vo + wo + w) = (vo, v) + (vo, w) + (wiv) + (w,w), 


and by using the fact that (v, v) = (w,w) = (v + w,v + w) = O. If the character- 
istic of the field of scalars is not 2, then (8.1) and (8.2) are equivalent. For if (8.2) 
holds for all v, w, then setting w = v we find (v,v) = —{v,v). This implies that 
2(v, v) = 0, hence that (v, v) = 0 unless 2 = 0 in the field. 

Note that if F has characteristic 2, then 1 = —1 in F, so (8.2) shows that the 
form is actually symmetric. But most symmetric forms don’t satisfy (8.1). 

The matrix A of a skew-symmetric form with respect to an arbitrary basis is 
characterized by the properties 


(8.3) aii = 0 and aij = —aji, ifi # j. 


We take these properties as the definition of a skew-symmetric matrix. If the charac- 
teristic is not 2, then this is equivalent with the condition 


(8.4) A' = -A. 
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(8.5) Theorem. 


(a) Let V be a vector space of dimension m over a field F, and let (,) be a nonde- 
generate skew-symmetric form on V. Then m is an even integer, and there is a 
basis B of V such that the matrix A of the form with respect to that basis is 


where 0,/ denote the n X-n matrices and n = $m. 


(b) Matrix form: Let A be a nonsingular skew-symmetric m X m matrix. Then m is 
even, and there is a matrix Q © GL,,(F) such that QAQ' is the matrix Joy. 


A basis B as in (8.6a) is called a standard symplectic basis. Note that rearrang- 
ing the standard symplectic basis in the order (v;, Un+1, U2, Unt2,..., Un, Uon) Changes 
the matrix J2, into a matrix made up of 2 X 2 blocks 


along the diagonal. This is the form which is most convenient for proving the theo- 
rem. We leave the proof as an exercise. o 


9. SUMMARY OF RESULTS, IN MATRIX NOTATION 


Real numbers: A square matrix A is symmetric if A‘ = A and orthogonal if 
A'= A! 


(1) Spectral Theorem: If A is a real symmetric matrix, there is an orthogonal ma- 
trix P such that PAP'(= PAP’') is diagonal. 


(2) If A is a real symmetric matrix, there is a real invertible matrix P such that 


Ip 
PAP' = —Im : 
0, 
for some integers p,m, z. 
(3) Sylvester’s Law: The numbers p,m, z are determined by the matrix A. 


Complex numbers: A complex square matrix A is hermitian if A* = A, uni- 
tary if A¥ = A', and normal if AA* = A*A. 


(1) Spectral Theorem: If A is a hermitian matrix, there is a unitary matrix P such 
that PAP* is a real diagonal matrix. 


(2) If A is a normal matrix, there is a unitary matrix P such that PAP* is diagonal. 


262 Bilinear Forms Chapter 7 


F arbitrary: A square n Xn matrix is skew-symmetric if aii = 0 and ay = 
~ayji for all i, 7. If A is an invertible skew-symmetric matrix, then n is even, and there 
is an invertible matrix P so that PAP’ has the form 


(9.1) Note. The rule A’ = (P‘)"'A(P™') for change of basis in a bilinear form (see 
(1.12)) is rather ugly because of the way the matrix P of change of coordinates is 
defined. It is possible to rearrange equations (4.17) of Chapter 3, by writing 


(9.2) vo) = 6, qv; or B’' = QB’. 
7 


This results in Q = (P~')', and with this rule we obtain the nicer formula 
A’ = QAQ', 


to replace (1.12). We can use it if we want to. 


The problem with formula (9.2) is that change of basis on a linear transforma- 
tion gets messed up; namely the formula A’ = PAP™' [Chapter 4 (3.4)] is replaced 
by A’ = (Q°')'AQ'. Trying to keep the formulas neat is like trying to smooth a bump 
in a rug. 

This brings up an important point. Linear operators on V and bilinear forms on 
V are each given by ann X n matrix A, once a basis has been chosen. One is tempted 
to think that the theories of linear operators and of bilinear forms are somehow 
equivalent, but they are not, unless a basis is fixed. For under a change of basis the 
matrix of a bilinear form changes to (P')'AP™' (1.12), while the matrix of a linear 
operator changes to PAP”! [Chapter 4 (3.4)]. So the new matrices are no longer 
equal. To be precise, this shows that the theories diverge when the basis is changed, 
unless the matrix P of change of basis happens to be orthogonal. If P is orthogonal, 
then P = (P')"', and we are all right. The matrices remain equal. This is one benefit 
of working with orthonormal bases. 


Yvonne Verdier 
EXERCISES 


1. Definition of Bilinear Form 


1. Let A and B be real n X n matrices. Prove that if X'ay = x'ByY for all vectors X,Y in R”, 
then A = B. 
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- Prove directly that the bilinear form represented by the matrix [ ‘| iS positive 


definite if and only if a > 0 and ad — b? > 0. 


- Apply the Gram—Schmidt procedure to the basis (1, 1,0)', (1,0, 1)', (0, 1, 1)', when the 


form is dot product. 


4. Leta = i Find an orthonormal basis for R? with respect to the form X‘AY. 
5. (a) Prove that every real square matrix is the sum of a symmetric matrix and a skew- 


al 


symmetric matrix (A' = —A) in exactly one way. 
(b) Let (,) be a bilinear form on a real vector space V. Show that there is a symmetric 
form (,) and a skew-symmetric form [, ] so that (,) = (,) + [,]. 


. Let (,) be a symmetric bilinear form on a vector space V over a field F. The function gq: 


V— F defined by g(v) = (v, v) is called the quadratic form associated to the bilinear 
form. Show how to recover the bilinear form from q, if the characteristic of the field F is 
not 2, by expanding q(v + w). 

Let X, Y be vectors in C”, and assume that X # 0. Prove that there is a symmetric matrix 
B such that BX = Y. 


Symmetric Forms: Orthogonality 


. Prove that a positive definite form is nondegenerate. 


2. A matrix A is called positive semidefinite if X'Ax = 0 for all X € R”. Prove that A‘A is 


10. 


positive semidefinite for any m X n real matrix A. 


. Find an orthogonal basis for the form on R” whose matrix is as follows. 


11 feo.1 
(a) 1 | (b);O 2 1 
ie Nami | 


. Extend the vector x, = (1, 1, 1)'/V3 to an orthonormal basis for R°. 
*5, 


Prove that if the columns of an n X n matrix A form an orthonormal basis, then the rows 
do too. 


. Let A,A’ be symmetric matrices related by A = P'A’P, where P € GL,(F). Is it true that 


the ranks of A and of A’ are equal? 


. Let A be the matrix of a symmetric bilinear form (, ) with respect to some basis. Prove or 


disprove: The eigenvalues of A are independent of the basis. 


. Prove that the only real matrix which is orthogonal, symmetric, and positive definite is 


the identity. 


. The vector space P of all real polynomials of degree = 7 has a bilinear form, defined by 


(f,g) = [ Slag ade. 


Find an orthonormal basis for P when n has the following values. (a) 1 (b) 2 (c) 3 


Let V denote the vector space of real n X n matrices. Prove that (A, B) = trace(A'‘B) is a 
positive definite bilinear form on V. Find an orthonormal basis for this form. 
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11. A symmetric matrix A is called negative definite if x‘AX < 0 for all x # 0. Give a crite- 
rion analogous to (1.26) for a symmetric matrix A to be negative definite. 


12. Prove that every symmetric nonsingular complex matrix A has the form A = P'P. 


13. In the notation of (2.12), show by example that the span of (v,,..., vp) is not determined 
by the form. 
14. (a) Let W be a subspace of a vector space V on which a symmetric bilinear form is given. 
Prove that W* is a subspace. 
(b) Prove that the null space N is a subspace. 
15. Let W,,W2 be subspaces of a vector space V with a symmetric bilinear form. Prove each 
of the following. 
(a) (W, oh W,)+ = W,+ NM W,* (b) Woic we (c) If W, Ee W,, then W,* =) W,". 
16. Prove Proposition (2.7), that V = W@W? if the form is nondegenerate on W. 


17. Let V = R?”? be the vector space of real 2 X 2 matrices. 

(a) Determine the matrix of the bilinear form (A,B) = trace(AB) on V with respect to the 
standard basis {e;)}. 

(b) Determine the signature of this form. 

(c) Find an orthogonal basis for this form. 

(d) Determine the signature of the form on the subspace of V of matrices with trace 
zero. 

*18. Determine the signature of the form (4, B) = trace AB on the space R”*” of real n Xn 

matrices. 

19. Let V = R’”? be the space of 2 X 2 matrices. 

(a) Show that the form {A, B) defined by (A,B) = det(A + B) — det A — det B is sym- 
metric and bilinear. 

(b) Compute the matrix of this form with respect to the standard basis {e;;}, and deter- 
mine the signature of the form. 

(c) Do the same for the subspace of matrices of trace zero. 

20. Do exercise 19 for R°*’, replacing the quadratic form det A by the coefficient of ¢ in the 
characteristic polynomial of A. 

21. Decide what the analogue of Sylvester’s Law for symmetric forms over complex vector 
spaces is, and prove it. 

22. Using the method of proof of Theorem (2.9), find necessary and sufficient conditions on 
a field F so that every finite-dimensional vector space V over F with a symmetric bilinear 
form {, ) has an orthogonal basis. 

I 

' 

(a) Prove that the bilinear form X'AY on F? can not be diagonalized. 

(b) Determine the orbits for the action P,A~~~» PAP' of GL2(F) on the space of 2 x 2 
matrices with coefficients in F. 


23. Let F = F., and let A = 


3. The Geometry Associated to a Positive Form 


1. Let V be a Euclidean space. Prove the Schwarz Inequality and the Triangle Inequality. 
2. Let W be a subspace of a Euclidean space V. Prove that W = Wt?. 


3. Let V be a Euclidean space. Show that if |v| = |w|, then (v + w) 4 (v — w). Interpret 
this formula geometrically. 
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4. 


5. 
6. 


oT, 


*8. 


£9. 


10. 


i. 


Prove the parallelogram law |v + w|? + |v — w/?? = 2|v[/? + 2/w[? in a Euclidean 
space. 


Prove that the orthogonal projection (3.7) is a linear transformation. 


Find the matrix of the projection a: R’ —— R? such that the image of the standard bases 
of R* forms an equilateral triangle and 7 (e,) points in the direction of the x-axis. 


Let W be a two-dimensional subspace of R*, and consider the orthogonal projection 7 of 

R?® onto W. Let (a;, b;)' be the coordinate vector of 7 (e;), with respect to a chosen or- 

thonormal basis of W. Prove that (a,, a2, a3) and (6,, b2, bs) are orthogonal unit vectors. 

Let w € R” be a vector of length 1. 

(a) Prove that the matrix P = | — 2ww' is orthogonal. 

(b) Prove that multiplication by P is a reflection through the space W orthogonal to w, 
that is, prove that if we write an arbitrary vector v in the form v = cw + w’, where 
w’ © W?, then Pv = -—cw + w’. 

(c) Let X,Y be arbitrary vectors in R” with the same length. Determine a vector w such 
that PX = Y. 

Use exercise 8 to prove that every orthogonal n X n matrix is a product of at most n 

reflections. 

Let A be a real symmetric matrix, and let T be the linear operator on R” whose matrix 

is A. 

(a) Prove that (ker T) 1 (im 7) and that V = (ker T)@(im 7). 

(b) Prove that 7 is an orthogonal projection onto im T if and only if, in addition to being 
symmetric, A? = A. 

Let A be symmetric and positive definite. Prove that the maximal matrix entries are on 

the diagonal. 


4, Hermitian Forms 


° 


= 


. Verify rules (4.4). 


Show that the dot product form (X - Y) = X'Y is not positive definite on C”. 


Prove that a matrix A is hermitian if and only if the associated form X*A is a hermitian 
form. 


. Prove that if X*AX is real for all complex vectors X, then A is hermitian. 
. Prove that the n X n hermitian matrices form a real vector space, and find a basis for that 


space. 


. Let V be a two-dimensional hermitian space. Let (v;, v2) be an orthonormal basis for V. 


Describe all orthonormal bases (v,;’, v2’) with v; = v,’. 


. Let X,Y € C” be orthogonal vectors. Prove that |x + y|? = |x|? + |y/. 


Is (X,Y) = xiy1 + ixiy2 — bey, + ixy2 on C? a hermitian form? 


. Let A,B be positive definite hermitian matrices. Determine which of the following ma- 


trices are positive definite hermitian: A’, A~', AB, A + B. 
Prove that the determinant of a hermitian matrix is a real number. 


. Prove that A is positive definite hermitian if and only if A = P*P for some invertible ma- 


trix P. 


Prove Theorem (4.19), that a hermitian form on a complex vector space V has an or- 
thonormal basis if and only if it is positive definite. 
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13. Extend the criterion (1.26) for positive definiteness to hermitian matrices. 
14. State and prove an analogue of Sylvester’s Law for hermitian matrices. 


15. Let (,) be a hermitian form on a complex vector space V, and let {v, w} denote the real 
part of the complex number (v, w). Prove that if V is regarded as a real vector space, 
then { ,} is a symmetric bilinear form on V, and if ( , ) is positive definite, then { , } is too. 
What can you say about the imaginary part? 


16. Let P be the vector space of polynomials of degree = n. 
(a) Show that 


(f,8) = [ F(e)g (e'%)do 


is a positive definite hermitian form on P. 
(b) Find an orthonormal basis for this form. 


17. Determine whether or not the following rules define hermitian forms on the space C”*” 
of complex matrices, and if so, determine their signature. 
(a) A, B~~> trace (A*B) (b) A,B trace (AB) 

18. Let A be a unitary matrix. Prove that |det A| = 1. 

19, Let P be a unitary matrix, and let X,,X2 be eigenvectors for P, with distinct eigenvalues 
A,,A2. Prove that X, and X2 are orthogonal with respect to the standard hermitian product 
on. 7, 

*20. Let A be any complex matrix. Prove that 7 + A*A is nonsingular. 
21. Prove Proposition (4.20). 


5. The Spectral Theorem 


1. Prove that if 7 is a hermitian operator then the rule {v, w} = (v, Tw) = X*myY defines a 
second hermitian form on V. 


2. Prove that the eigenvalues of a real symmetric matrix are real numbers. 


3. Prove ‘that eigenvectors associated to distinct eigenvalues of a hermitian matrix A are 
orthogonal. 


4. Find a unitary matrix P so that PAP* is diagonal, when 


LE 


5. Find a real orthogonal matrix P so that PAP' is diagonal, when 


12 tL Peo 1 
@a=|) i (Wa=it | th @GAa= Tesi: 
aaa ee | 1 0 0 


6. Prove the equivalence of conditions (a) and (b) of the Spectral Theorem. 


7. Prove that a real symmetric matrix A is positive definite if and only if its eigenvalues are 
positive. 


8. Show that the only matrix which is both positive definite hermitian and unitary is the 
identity /. 


9. Let A be a real symmetric matrix. Prove that e“ is symmetric and positive definite. 
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10. 
aa i 
12. 


13. 


+14; 


153 


16. 


Prove that for any square matrix A, ker A = (im A*)*, 

Let £ = e?7'/", and let A be the n X n matrix aj = CIV n. Prove that A is unitary. 
Show that for every complex matrix A there is a unitary matrix P such that PAP* is upper 
triangular. 

Let A be a hermitian matrix. Prove that there is a unitary matrix P with determinant 1 
such that PAP* is diagonal. 


Let A,B be hermitian matrices which commute. Prove that there is a unitary matrix P 
such that PAP* and PBP* are both diagonal. 

Use the Spectral Theorem to give a new proof of the fact that a positive definite real 
symmetric m X n matrix P has the form P = AA‘ for some n X n matrix A. 

Let A, uw be distinct eigenvalues of a complex symmetric matrix A, and let x, Y be eigen- 
vectors associated to these eigenvalues. Prove that X is orthogonal to Y with respect to dot 
product. 


6. Conics and Quadrics 


1. 
= 


3 


4. 


3 


6. 


Determine the type of the quadric x?+4xy+2xz+z? + 3x+z —6=0. 

Suppose that (6.1) represents an ellipse. Instead of diagonalizing the form and then mak- 
ing a translation to reduce to the standard type, we could make the translation first. 
Show how to compute the required translation by calculus. 

Discuss all degenerate loci for conics. 

Give a necessary and sufficient condition, in terms of the coefficients of its equation, for 
a conic to be a circle. 

(a) Describe the types of conic in terms of the signature of the quadratic form. 

(b) Do the same for quadrics in R?. 

Describe the degenerate quadrics, that is, those which are not listed in (6.14). 


7. The Spectral Theorem for Normal Operators 


Show that for any normal matrix A, ker A = (im A)*. 

Prove or disprove: If A is a normal matrix and W is an A-invariant subspace of V = C”, 
then W* is also A-invariant. 

A matrix is skew-hermitian if A* = -A. What can you say about the eigenvalues and the 
possibility of diagonalizing such a matrix? 


. Prove that the cyclic shift operator 


is normal, and determine its diagonalization. 

Let P be a real matrix which is normal and has real eigenvalues. Prove that P is 
symmetric. 

Let P be a real skew-symmetric matrix. Prove that P is normal. 
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*7, 


8. 


o 


10. 


11 
12 


° 


8. 


. 


ahawn = 


a“ 
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is a normal matrix. 


(a) Let A be a complex symmetric matrix. Prove that eigenvectors of A with distinct ei- 
genvalues are orthogonal with respect to the bilinear form X'X. 
*(b) Give an example of a complex symmetric matrix A such that there is no P © O,(C) 
with PAP' diagonal. 


Let A be a normal matrix. Prove that A is hermitian if and only if all eigenvalues of A are 
real, and that A is unitary if and only if every eigenvalue has absolute value 1. 


Let V be a finite-dimensional complex vector space with a positive definite hermitian 

form (,), and let 7; V—— V be a linear operator on V. Let A be the matrix of T with 

respect to an orthonormal basis B. The adjoint operator 7*: V——> V is defined as the 

operator whose matrix with respect to the same basis is A*. 

(a) Prove that T and T* are related by the equations (Tv,w) = (v,T*w) and 
(v, Tw) = (T *v, w) for all v, w © W. Prove that the first of these equations charac- 
terizes T*. 

(b) Prove that T* does not depend on the choice of orthonormal basis. 

(c) Let v be an eigenvector for T with eigenvalue A, and let W = v'‘ be the space of 
vectors orthogonal to v. Prove that W is T *-invariant. 


Prove that for any linear operator T, TT * is hermitian. 


Let V be a finite-dimensional complex vector space with a positive definite hermitian 

form (,). A linear operator 7; V—— V is called normal if TT * = T*T. 

(a) Prove that T is normal if and only if (Tv, Tw) = (T *v, T *w) for all v, w € V, and 
verify that hermitian operators and unitary operators are normal. 

(b) Assume that 7 is a normal operator, and let v be an eigenvector for T, with eigen- 
value A. Prove that v is also an eigenvector for T*, and determine its eigenvalue. 

(c) Prove that if v is an eigenvector, then W = v~ is T-invariant, and use this to prove 
the Spectral Theorem for normal operators. 


Skew-Symmetric Forms 


Prove or disprove: A matrix A is skew-symmetric if and only if x'Ax = 0 for all x. 


. Prove that a form is skew-symmetric if and only if its matrix has the properties (8.4). 


Prove or disprove: A skew-symmetric n X n matrix is singular if n is odd. 


- Prove or disprove: The eigenvalues of a real skew-symmetric matrix are purely 


imaginary. 
Let S be a real skew-symmetric matrix. Prove that / + S is invertible, and that 
(1 —'S)(7 + S)"! is orthogonal. 


*6. Let A be a real skew-symmetric matrix. 


(a) Prove that det A = 0. 
(b) Prove that if A has integer entries, then det A is the square of an integer. 
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7. Let (,) be a skew-symmetric form on a vector space V. Define orthogonality, null space, 
and nondegenerate forms as in Section 2. 
(a) Prove that the form is nondegenerate if and only if its matrix with respect to any ba- 
sis is nonsingular. 
(b) Prove that if W is a subspace such that the restriction of the form to W is nondegen- 
erate, then V = WQ@W?. 
(c) Prove that if the form is not identically zero, then there is a subspace W and a basis 


of W such that the restriction of the form to W has matrix i 


Olle 
(d) Prove Theorem (8.6). 
9. Summary of Results, in Matrix Notation 


1. Determine the symmetry of the matrices AB + BA and AB — BA in the following cases. 
(a) A,B symmetric (b) A,B hermitian (c) A,B skew-symmetric (d) A symmetric, 
B skew-symmetric 
2. State which of the following rules define operations of GL,(C) on the space C””” of all 
complex matrices: 


P, Am PAP’, (P')'a(P '), (PPS Po lAPt, APY, Pla. 


3. (a) With each of the following types of matrices, describe the possible determinants: 
(i) real orthogonal (ii) complex orthogonal (iii) unitary (iv) hermitian 
(v) symplectic (vi) real symmetric, positive definite (vii) real symmetric, nega- 
tive definite 
(b) Which of these types of matrices have only real eigenvalues? 


' , ; WOES. ; 
4. (a) Let E be an arbitrary complex matrix. Prove that the matrix \ ane | is invertible. 


CPD 
*5, (a) What is wrong with the following argument? Let P be a real orthogonal matrix. Let X 
be a (possibly complex) eigenvector of P, with eigenvalue A. Then x'P'x = (Px)!x = 
Ax'x. On the other hand, x'p'x = x'(p-'x) = A'x'x. Therefore A = A™', and so 
A= 1. 
(b) State and prove a correct theorem based on this argument. 
*6. Show how to describe any element of SO, in terms of rotations of two orthogonal planes 
in R*. 
*7, Let A be a real m Xn matrix. Prove that there are orthogonal matrices P © O,, and 
Q € O, such that PAQ = D is diagonal, with nonnegative diagonal entries. 


(b) Find the inverse in block form i g | 


Chapter 8 


Linear Groups 


In these days the angel of topology and the devil of abstract algebra 
fight for the soul of every individual discipline of mathematics. 


Hermann Wey! 


I, THE CLASSICAL LINEAR GROUPS 


270 


Subgroups of the general linear group GL, are called linear groups. In this chapter 
we will study the most important ones: the orthogonal, unitary, and symplectic 
groups. They are called the classical groups. 

The classical groups arise as stabilizers for some natural operations of GL, on 
the space of n Xn matrices. The first of these operations is that which describes 
change of basis in a bilinear form. The rule 


Cat) P, Aww (PAP! 


is an operation of GL, on the set of all 1 X n matrices. This is true for any field of 
scalars, but we will be interested in the real and complex cases. As we have seen in 
Chapter 7 (1.15), the orbit of a matrix A under this operation is the set of matrices A’ 
which represent the form X'AY, but with respect to different bases. It is customary to 
call matrices in the same orbit congruent. We can set Q = (P')”' to obtain the equiv- 
alent definition 


(1.2) A and A' are congruent if A' = QAQ' for some O © GLy\F). 


Sylvester’s Law [Chapter 7 (2.11)] describes the different orbits or congruence 
classes of real symmetric matrices. Every congruence class of real symmetric ma- 
trices contains exactly one matrix of the form Chapter 7 (2.10). The orthogonal 
group, which we have defined before, is the stabilizer of the identity matrix for this 
operation. As before, we will denote the real orthogonal group by the symbol O,: 


(1.3) On = {P © GL,(R) | P'P = J}. 
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The complex orthogonal group is defined analogously: 
On(C) = {P © GL,(C) | p'P = 7}. 
The stabilizer of the Lorentz form [Chapter 7 (2.16)], defined by the matrix 


is called the Lorentz group. It is denoted by O3,:(R) or O3.:: 
(1.4) O3, = {P = GL, (R) | P'13,P = Taiys 


The linear operators represented by these matrices are often called Lorentz transfor- 
mations. The subscript (3,1) indicates the signature of the matrix, the number of 
+1°s and ~ 1's. In this way an analogous group O,., can be defined for any signature 


(p,q). 
The operation (1.1) also describes change of basis in forms which are not sym- 


metric. Thus Theorem (8.6) of Chapter 7 tells us this: 


(1.5) Corollary. There is exactly one congruence class of real nonsingular skew- 
symmetric m X m matrices, if m is even. o 


The standard skew-symmetric form is defined by the 2n X 2n matrix J (Chapter 7 
(8.5)), and its stabilizer is called the symplectic group 


(1.6) SPo(R) = {P © GL,(R) | P'up = J}. 


Again, the complex symplectic group SP2,(C) is defined analogously. 
Finally, the unitary group is defined in terms of the operation 


(17) P,Anw> (P*) AP!” 


This definition makes sense only when the field of scalars is the complex field. Ex- 
actly as with bilinear forms, the orbit of a matrix A consists of the matrices which 
define the form (X,Y) = X*AY with respect to different bases (see [Chapter 7 
(4.12)]). The unitary group is the stabilizer of the identity matrix for this action: 


(1.8) Un = {P | P*P = I}. 


Thus U, is the group of matrices representing changes of basis which leaves the her- 


mitian dot product [Chapter 7 (4.2)] x*Y invariant. 
The word special is added to indicate the subgroup of matrices w:th determi- 


nant 1. This gives us some more groups: 


Special linear group SL,(R): n X n matrices P with determinant 1; 
Special orthogonal group SO,(R): the intersection SL,(R) A O,(R); 
Special unitary group SUn: the intersection SL,(C) N Un. 
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Though this is not obvious from the definition, symplectic matrices have determinant 
i, so the two uses of the letter S do not cause conflict. 


2. THE SPECIAL UNITARY GROUP SU, 


The main object of this chapter is to describe the geometric properties of the classi- 
cal linear groups, by considering them as subsets of the spaces R"*” or C””” of all 
matrices. We know the geometry of a few groups already. For example, GL,\(C) = 
C™ is the “punctured plane” C — {0}. Also, if p is a 1 X 1 matrix, then p* = p. 
Thus 


(2.1) Ui 1p © | pp — 1}. 


This is the set of complex numbers of absolute value |—the unit circle in the com- 
plex plane. We can identify it with the unit circle in R’, 


x1 + bee = Le 


by sending x) + x%2.i~~» (41, x12). The group SO; of rotations of the plane is isomor- 
phic to U,. It is also a circle, embedded into R**? by the map 


(22) (x1, x2) ww I" a 
X2 xX 
We will describe some more of the groups in the following sections. 

The dimension of a linear group G is, roughly speaking, the number of degrees 
of freedom of a matrix in G. The group SO2, for example, has dimension |. A ma- 
trix in SO, represents rotation by an angle 6, and this angle is the single parameter 
needed to determine the rotation. We will discuss dimension more carefully in Sec- 
tion 7, but-we want to describe some of the low-dimensional groups explicitly first. 
The smallest dimension in which really interesting groups appear is 3, and three of 
these—SU,, SO3, and SL2(R)—are very important. We will study the special uni- 
tary group SU; in this section. 


Let P = i“ 4 be an element of SU,, with a,b,c,d © C. The equations 


defining SU, are P*P = J and det P = 1. By Cramer’s Rule, 


p! = (det oa é “| = q aR 
-c a -c a 
Since P-' = p* for a matrix in SU>, we find Gee) ae Or 
“C4 bad 
(2.3) a@=d, and b=-c. 
Thus 


=. eee 
(2.4) p=| 4 a 
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The condition det P = 1 has become lost in the computation and must be put back: 
(2.5) da + bb = 1. 


Equations (2.3) and (2.5) provide a complete list of conditions describing the entries 
of a matrix in SU». The matrix P is described by the vector (a,b) € C’ of length 1, 
and any such vector gives us a matrix P © SU) by the rule (2.4). 

If we write out a,b in terms of their real and imaginary parts, equation (2.5) 
gives us a bijective correspondence between SU2.and points of R* lying on the locus 


(2.6) Yo ae ee eo — |. 


This equation is equivalent to (2.5) if we set a = x, + x2i and b = x3 + xual. 

The locus (2.6) is called the unit 3-sphere in RR*, in analogy with the unit 
sphere in R*. The number 3 refers to its dimension, the number of degrees of free- 
dom of a point on the sphere. Thus the unit sphere 


ein ees + x3" = 1 


in R’, being a surface, is called a 2-sphere. The unit circle in R’, a curve, is called a 
1 -sphere. We will sometimes denote a sphere of dimension d by S“. 
A bijective map f: S——>S’ between subsets of Euclidean spaces is called a 
homeomorphism if f and f ‘' are continuous maps (Appendix, Section 3). The corre- 
“spondence between SU2, considered as a subset of C**?, and the sphere (2.6) is obvi- 
ously continuous, as is its inverse. Therefore these two spaces are homeomorphic. 


(227) SU, is homeomorphic to the unit 3-sphere in R*. 


It is convenient to identify SU with the 3-sphere. We can do this if we repre- 
sent the matrix (2.4) by its top row, the vector (a,b) € C’*, or by the vector 
(x), .%2,.%3, x3) € R*. These representations can be thought of as different notations 
for the same element P of the group, and we will pass informally from one represen- 
tation to the other. For geometric visualization, the representations P = (a. b) and 
P = (xX,,X2,%3,%Xa), being in lower-dimensional spaces, are more convenient. 

The fact that the 3-sphere has a group structure is remarkable, because there is 
no way to make the 2-sphere into a group with a continuous law of composition. In 
fact, a famous theorem of topology asserts that the only spheres with continuous 
group laws are the 1-sphere, which is realized as the rotation group SO,, and the 
3-sphere SU2. 

We will now describe the algebraic structures on SU: analogous to the curves 
of constant latitude and longitude on the 2-sphere. The matrices !,—/ will play the 
roles of the north and south poles. In our vector notation, they are the points 
(+1, 0,0, 0) of the sphere. 

If the poles of the 2-sphere xj + x3 + x} = 1 are placed at the points 
(+1,0,0), then the latitudes are the circles x} = c, -1 <c¢ < 1. The analogues on 
the 3-sphere SU, of these latitudes are the surfaces on which the x;-coordinate is 
constant. They are two-dimensional spheres, embedded into R* by 


(2.8) x=ec and x° +x’ +27 =(1-c’?), -l<c <1. 
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These sets can be described algebraically as conjugacy classes in SU2. 


(2.9) Proposition. Except for two special classes, the conjugacy classes in SU) are 
the latitudes, the sets defined by the equations (2.8). For a given c in the interval 
(-1, 1), this set consists of all matrices P © SU, such that trace P = 2c. The remain- 
ing conjugacy classes are {/} and {—/}, each consisting of one element. These two 
classes make up the center Z = {+/} of the group SU2. 


Proof. The characteristic polynomial of the matrix P (2.4) is 
(2.10) ?—(ataytet+1=t? — 2xt + 1. 


This polynomial has a pair A, A of complex conjugate roots on the unit circle, and 
the roots, the eigenvalues of P, depend only on trace P = 2x,. Furthermore, two ma- 
trices with different traces have different eigenvalues. The proposition will follow if 
we show that the conjugacy class of P contains every matrix in SU; with the same 
eigenvalues. The cases x; = 1,~-1 correspond to the two special conjugacy classes 
{1}, {—1}, so the proof is completed by the next lemma. 
(2.11) Lemma. Let P be an element of SU2, with eigenvalues A, A. Then P is con- 
jugate in SU, to the matrix 

wr 

<I: 


Proof. By the Spectral Theorem for normal operators [Chapter 7 (7.3)], there 
is a unitary matrix Q so that QPQ* is diagonal. We only have to show that Q can be 
chosen so as to have determinant |. Say that det @ = 6. Since Q*Q = /, 
(det Q*){det Q) = 66 = 1; hence 6 has absolute value 1. Let € be a square root of 6. 
Then €€ = | too. The matrix Q; = €Q is in SU2, and P; = Q, PQ,* is also diagonal. 
The diagonal entries of P, are the eigenvalues A,A. The eigenvalues can be inter- 
changed, if desired, by conjugating by the matrix 


(2.12) 0. = E 


which is also an element of SU2. 5 


Next we will introduce the longitudes of SU,. The longitudes on the 2-sphere 
x;? + x)? + x3’ = 1 can be described as intersections of the sphere with planes con- 
taining the two poles (+1,0,0). When we add a fourth variable x. to get the equa- 
tion of the 3-sphere, a natural way to extend this definition is to form the intersec- 
tion with a two-dimensional subspace of R* containing the two poles +/. This is a 
circle in SU2, and we will think of these circles as the longitudes. Thus while the lat- 
itudes on SU; are 2-spheres, the longitudes are |-spheres, the “great circles” through 
the poles. 

Note that every point P = (x1,x2,x3,%a) of SU2 except for the poles is con- 
tained in exactly one longitude. This is because if P is not a pole, then P and / will 
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be linearly independent and thus will span a subspace V of §R* of dimension 2. The 
intersection SU, M V is the unique longitude containing P. 

The intersection of SU; with the plane W defined by x, = x, = 0 is a particu- 
larly nice longitude. In matrix notation, this great circle consists of the diagonal ma- 
trices in SU2, which form a subgroup T: 


on r=(( slp} 


The other longitudes are described in the following proposition. 


(2.14) Proposition. The longitudes of SU; are the conjugate subgroups Q7Q* of 
the subgroup 7. 


SO2 


Diagonal 
matrices 


Trace-zero 
matrices 


SU, 
(2.15) Figure. Some latitudes and longitudes in SU2. 


In Figure (2.15) the 3-sphere SU, is projected from R* onto the unit disc in the 
plane. The conjugacy class shown is the “equatorial” latitude in R*, which is defined 
by the equation x, = 0. Just as the orthogonal projection of a circle from R? to R? is 
an ellipse, the projection of this 2-sphere from R* to R? is an ellipsoid, and the fur- 
ther projection of this ellipsoid to the plane is the elliptical disc shown. 


Proof of Proposition (2.14). The point here is to show that any conjugate sub- 
group Q7Q% is a longitude. Lemma (2.11) tells us that every element P € SU, lies in 
one of these conjugate subgroups (though the roles of Q@ and Q* have been reversed). 
Since every P # +/ is contained in exactly one longitude, it will follow that every 
longitude is one of the subgroups QTQ*. 

So let us show that a conjugate subgroup Q7Q%* is a longitude. The reason this 
is true is that conjugation by a fixed element @ is a linear operator which sends the 
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subspace W to another subspace. We will compute the conjugate explicitly to make 
this clear. Say that Q is the matrix (2.4). Let w = (w;, v2, 0,0) denote a variable el- 
ement of W, and set z = w; + woi. Then 


we all \¢ the laa | 
-b @ ZIlb al * * | 


Computing these entries. we find that w is sent to the vector u = (uw), u2, U3, Ua), 
where 


— = 2 2 2 2 
“ay Wiehe (x; + x5 Xx" 7X4 yw, 


U3 = 2(x1xX4+%2X3)W2, Ug = 2(x2X4—X1X3)We. 


The coordinates u, are real linear combinations of (w,, w2). This shows that the map 
wav uw is a real linear transformation. So its image V is a subspace of IR*. The con- 
jugate group QTQ* 1s SU: M V. Since QGTQ* contains the poles +/, so does V, and 
this shows that OTQ* is a longitude. 5 


We will describe another geometric configuration briefly: As we have seen, the 
subgroup 7 of diagonal matrices is a great circle in the 3-sphere SU,. The left cosets 
of this subgroup, the sets of the form QT for @ € SU2, are also great circles, and 
they partition the group SU2. Thus the 3-sphere is partitioned into great circles. This 
very interesting configuration is called the Hopf fibration. 


3. THE ORTHOGONAL REPRESENTATION OF SU, 


We saw in the last section that the conjugacy classes in the special unitary group SU; 
are two-dimensional spheres. Since conjugacy classes are orbits for the operation of 
conjugation, SU, operates on these spheres. In this section we will show that conju- 
gation by an element P € SU) acts on each of the spheres as a rotation, and that the 
map sending P to the matrix of this rotation defines a surjective homomorphism 


(3.1) gy: SU,—— SOs, 


whose kernel is the center Z = {+/} of SU2. This homomorphism is called the 
orthogonal representation of SU2. It represents a complex 2 X 2 matrix P in SU2 by a 
real 3 X 3 rotation matrix y(P). 

The safest way to show that P operates by rotating a conjugacy class may be to 
write the matrix representing the rotation down explicitly. This is done in (3.12). 
However, the formula for g(P) is complicated and not particularly enlightening. It is 
better to describe ¢ indirectly, as we will do presently. Let us discuss the geometry 
of the map first. 

Since the kernel of ¢ is {+/}, its cosets are the sets {+P}. They form the fibres 
of the homomorphism. Thus every element of SO3 corresponds to a pair of unitary 
matrices which differ by sign. Because of this, the group SU2 is called a double cov- 
ering of the group SO3. 
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The map 4: SO:—= SO, of the |-sphere to itself defined by py poy is an- 
other example of a double covering. Its kernel also consists of two elements, the 
identity and the rotation by 7. Every fibre of contains two rotations po and pro. 


O-C 


(3.2) Figure. A double covering of the 1-sphere. 


The orthogonal representation can be used to identify the topological struc- 
ture of the rotation group. In vector notation, if P = (x%,...,x4), then -—P = 
(—x,..., ~X4), and the point —P is called the antipode of P. So since points of the 
rotation group correspond to cosets {+P}, the group SO; can be obtained by identify- 
ing antipodal points on the 3-sphere SU2. The space obtained in this way is called 
the real projective 3-space: 


(3:3) SO; is homeomorphic to the real projective 3-space. 


The number 3 refers again to the dimension of the space. Points of the real projec- 
tive 3-space are also in bijective correspondence with lines through the origin (or 
one-dimensional subspaces) of [R*. Every line through the origin meets the unit 
sphere in a pair of antipodal points. 

As we noted in Section 8 of Chapter 4, every element of SO; except the iden- 
tity can be described in terms of a pair (vu, @), where v is a unit vector in the axis of 
rotation and where @ is the angle of rotation. However, the two pairs (v,@) and 
(-v, - 8) represent the same rotation. The choice of one of these pairs is referred to 
by physicists as the choice of a spin. It is not possible to make a choice of spin which 
varies continuously over the whole group. Instead, the two possible choices define a 
double covering of SO, — {/}. We may realize the set of all pairs (v, 0) as the 
product space 5 x 0, where S is the 2-sphere of unit vectors in R’, and where © is 
the set of nonzero angles 0 < @ < 277. This product space maps to SO3: 


(3/4) ; w: SX O8—> SO; — {i}, 
by sending (v,0) to the rotation about c through the angle 6. The map yw is a double 
covering of SO, — {1} because every nontrivial rotation is associated to two pairs 
(v,), (—v, —4). 

We now have two double coverings of SO; — {/}, namely S x © and also 
SU, — {+1}, and it is plausible that they are equivalent. This is true: 


(3.5) Proposition. There is a homeomorphism h: (SU, — {+1})——> S$ x © which 
is compatible with the maps SO3, i.e., such that woh = g. 
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This map A is not a group homomorphism. In fact, neither its domain nor its range is 
a group. 

Proposition (3.5) is not very difficult to prove, but the proof is slightly elusive 
because there are two such homeomorphisms. They differ by a switch of the spin. 
On the other hand, the fact that this homeomorphism exists follows from a general 
theorem of topology, because the space SU, ~ {+/} is simply connected. (A simply 
connected space is one which is path connected and such that every loop in the space 
can be contracted continuously to a point.) It is better to leave this proof to the 
topologists. 


Therefore every element of SU2 except +/ can be described as a rotation of R° 
together with a choice of spin. Because of this, SU2 is often called the Spin group. 

We now proceed to compute the homomorphism ¢, and to begin, we must se- 
lect a conjugacy class. It is convenient to choose the one consisting of the trace-zero 
matrices in SU,, which is the one defined by x, = 0 and which is illustrated in Fig- 
ure (2.15). The group operates in the same way on the other classes. Let us call the 
conjugacy class of trace-zero matrices C. An element A of C will be a matrix of the 
form 


3.6) A= ue eee 
—y3+y4l —yal ; 

where 

(3.7) yr t+ys*+ye = 1. 


Notice that this matrix is skew-hermitian, that is, it has the property 
(3.8) A* = —A. 


(We haven*t run across skew-hermitian matrices before, but they aren’t very differ- 
ent from hermitian matrices. In fact, A is a skew-hermitian matrix if and only if 
H = iA is hermitian.) The 2 X 2 skew-hermitian matrices with trace zero form a real 
vector space V of dimension 3, with basis 


3.9) »=|['_) bs } : iit 


In the notation of (3.6), A = BY, where Y = (y2, y3, ya)‘. So the basis B corresponds 
to the standard basis (e2, e3, e4) in the space R*, and (3.7) tells us that our conjugacy 
class is represented as the unit sphere in this space. 

Note that SU, operates by conjugation on the whole space V of trace-zero, 
skew-hermitian matrices, not only on its unit sphere: If A € V, P € SU, and if 
B = PAP* = PAP™', then trace B = 0, and B* = (PAP*)* = PA*P* = (P(-A)p* = 
~B. Also, conjugation by a fixed matrix P gives a linear operator on V, because 
P(A + A’)P* = PAP* + PA'P*, and if r is a real number, then P(rA)P* = rPAP*. 
The matrix of this linear operator is defined to be y(P). To determine the matrix ex- 
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plicitly, we conjugate the basis (3.7) by P and rewrite the result in terms of the basis. 
For example, 


(3.10) le all iE owned =" 
-b a\l -ijL6 a] ‘| -2ab bb-aa|’ 


The coordinates of this matrix are y. = aa — bb, y; = i(-ab + @b), and ys = 
-(ab + ab). They form the first column of the matrix @(P). Similar computation for 
the other columns yields 


(aa— bb) i(a@b—ab) (ab+ab) 
(3.11) i(ab—ab) 4(a?+a@°+b?+b?) $(a?-a@?—-b?+b?) |. 
-(ab+ab) §5(@?—a?+b’—b?) 3(a?+a?—b?—b’) 
We will not make use of the above computation. Even without it, we know 


that y(P) is a real 3 X 3 matrix because it is the matrix of a linear operator on a real 
vector space V of dimension 3. 


(3.12) Lemma. The map P»~~ ¢(P) defines a homomorphism SU,—> GL;(R). 


Proof. It follows from the associative law [Chapter 5 (5.1)] for the operation 
of conjugation that y is compatible with multiplication: The operation of a product 
PQ on a matrix A is (PQ)A(PQ)* = P(QAQ*)P*. This is the composition of the opera- 
tions of conjugation by P and by @. Since the matrix of the composition of linear op- 
erators is the product matrix, (PQ) = y(P)~(Q). Being compatible with multipli- 
cation, e(P')e(P) = e(/2) = 13. Therefore ¢(P) is invertible for every P, and so @ 
is a homomorphism from SU2 to GL3(R), as asserted. o 


(3.13) Lemma. For any P, g(P) © SO3. Hence P»~~» ¢(P) defines a homomor- 
phism SU,—— SO. 


Proof. One could prove this lemma using Formula (3.11). To prove it concep- 
tually, we note that dot product on R? carries over to a bilinear form on V with a 
nice expression in terms of the matrices. Using the notation of (3.6), we define 
(A,A’) = yiys’ + yoyo’ + ysys’. Then 


(3.14) (A, A’) = —4trace(Ad’). 
This is proved by computation: 
ae eee? * 
‘ —(yay2' + ysys' + ysys') 1" 
and so trAA’ = —2(A,A’). 


This expression for dot product shows that it is preserved by conjugation by an 
element P € SU2: 


(PAP*, PA’P*) = —4 trace (PAP*PA'P*) = — 3 trace (AA’) = (A, A’). 
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Or, in terms of the coordinate vectors, (p(P)Y: p(P)Y’) = (Y-¥’). It follows that 
¢(P) lies in the orthogonal group O; = O;(R) [Chapter 4 (5.13)]. 

To complete the proof, let us verify that g(P) has determinant 1 for every 
P € SU;: Being a sphere, SU, is path connected. So only one of the two possible 
values +1 can be taken on by the continuous function det g(P). Since g(/2) = J3 
and det /3 = 1, the value is always +1, and y(P) € SOs, as required. o 


(3.15) Lemma. ker g = {+/}. 


Proof. The kernel of ¢ consists of the matrices P € SU; which act trivially on 
V, meaning that PAP* = A for all skew-hermitian matrices of trace zero. Suppose 
that P has the property PAP* = A, or PA = AP, for all P © V. We test it on the basis 
(3.7). The test leads to b = 0, a = G, which gives the two possibilities P = +/, and 
they are in the kernel. So ker g = {+/}, as claimed. o 


(3.16) Lemma. The image of the map ¢ is SOs. 


Proof. We first compute g(P) explicitly on the subgroup T of diagonal ma- 
trices in SU,. Let z = y3 + y4i. Then 


ay mame [®t ee J [2 #3] 
, all -z ~y,i | a —@*Z -y2i} 


So ~(P) fixes the first coordinate y2 and it multiplies z by a’. Since |a| = 1, we may 
write a = e'’. Multiplication by a? = e*° defines a rotation by 26 of the complex 
z-plane. Therefore 

1 0 0 
(3.18) y(P) =]0 cos 26 —sin 28 |. 

0 sin 20 cos 26 


This shows that the image of ¢ in SO; contains the subgroup H of all rotations about 
the point (1,0, 0)'. This point corresponds to the matrix E = il Since the unit 


sphere C is a conjugacy class, the operation of SU is transitive. So if Y is any unit 
vector in R®, there is an element @ € SU, such that ¢(Q)(1,0,0)' = Y, or in matrix 
notation, such that QEQ* = A. The conjugate subgroup ¢(Q)Hy(Q)* of rotations 
about Y is also in the image of ¢. Since every element of SO; is a rotation, ¢ is sur- 
jective. o 


The cosets making up the Hopf fibration which was mentioned at the end of 
last section, are the fibres of a continuous surjective map 
(3.19) aw: S>——> S$? 


from the 3-sphere to the 2-sphere. To define 7, we interpret S* as the special uni- 
tary group SU2, and S* as the conjugacy class C of trace-zero matrices, as above. We 
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sete = |! ‘| and we define 7 (P) = PEP*, for P © SU. The proof of the follow- 


ing proposition is left as an exercise. 


(3.20) Proposition. The fibres of the map 7 are the left cosets QT of the group T 
of diagonal matrices in SU. o 


4. THE SPECIAL LINEAR GROUP SL,(R) 


Since the special unitary group ts a sphere, it is a compact set. As an example of 
noncompact group. we will describe the special linear group SL2(R). To simplify no- 
tation, we denote SL2(R) by SL, in this section. 

invertible 2 x 2 matrices operate by left multiplication on the space R* of 
column vectors. and we can look at the associated action on rays in R*. A ray is a 
halt line R = {rx ; r = O}. The set of rays is in bijective correspondence with the 
points on the unit circle $', the ray R corresponding to the point R  S'. 

Our group SL» operates by left multiplication on the set of rays. Let us denote 
by H the stabilizer of the ray R; = {re,} in SL2(R). ft consists of matrices 


ee 
(4.1) p=|° “| 


where a is positive and b is arbitrary. 
The rotation group SO> is another subgroup of SL2, and it operates transitively 
on the set of rays. 


(4.2} Proposition. The map f: SO. x H——-> SL, defined by f(Q,B} = OB is 2 
homeomorphism (but not a group homomorphism). 


Proof. Notice that H M SO, = {/}. Therefore f is injective [Chapter 2 (8.6)j. 
To prove surjectivity of f, let P be an arbitrary element of SL., and let R, be the ray 
{re, | r > O}. Choose a rotation Q € SO, such that PR; = QR. Then Q ''P is in the 
stabilizer H, say Q°'P = B, or 


(4.3) P= On 


Since f is defined by matrix multiplication, it is a continuous map. Also, in the con- 
struction of the inverse map, the rotation Q depends continuously on P because the 
ray PR, does. Then B = QP also is a continuous function of P, and this shows that 
f~' is continuous as well. o 


Note that H can be identified by the rule B<—> (a, b) with the product space 
(positive reals) x R. And the space of positive reals is homeomorphic by the log 
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function to the space R of all real numbers. Thus H is homeomorphic to R*. Since 
SO), is a circle, we find that 


(4.4) SL2(R) is homeomorphic to the product space S' X R?’. 


The special linear group can be related to the Lorentz group O2., of two- 
dimensional space-time by a method analogous to that used in Section 3 for the or- 
thogonal representation of SU,. Let the coordinates in R* be y,,y2,2, with the 
Lorentz form 


(4.5) yy’ + yoy.’ — tt’, 


and let W be the space of real trace-zero matrices. Using the basis 


ee oe 


we associate to a coordinate vector (y:, y2,¢)' the matrix 
arti 
(4.7) A= ee | 


We use this representation of trace-zero matrices because the Lorentz form (4.5) has 
a simple matrix interpretation on such matrices: 
(4.8) (A,A’) = yiys’ + yoyo’ — tt’ = 3 trace (AA’). 
The group SL, acts on W by conjugation, 
(4.9) P,A™w> PAP", 
and this action preserves the Lorentz form on W, because 
trace (AA’) = trace ((PAP~')(PA’P™')), 


as in the previous section. Since conjugation is a linear operator on W, it defines a 
homomorphism ¢: SL,.—~ GL;(R). Since conjugation preserves the Lorentz form, 
the image ¢(P) of P is an element of O2,;. 


(4.10) Theorem. The kernel of the homomorphism ¢ is the subgroup {+/}, and 
the image is the path-connected component O2,,° of O2,, containing the identity /. 
Therefore 02,;° eS SL2(R)/{+ I}. 


It can be shown that the two-dimensional Lorentz group has four path-connected 
components. 

The fact that the kernel of g is {+/} is easy to check, and the last assertion of 
the theorem follows from the others. We omit the proof that the image of ¢ is the 
subgroup 021°. o 
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5. ONE-PARAMETER SUBGROUPS 


In Chapter 4, we defined the exponential of a matrix by the series 
(Se 1) e4 = 71+ (1/INA + (1/2!)a? + (1/3142 + 


We wiil now use this function to describe the homomorphisms from the additive 
group of real numbers to the general linear group, which are differentiable functions 
of the variable t € R. Such a homomorphism is called a one-parameter subgroup of 
GL,. (Actually, this use of the phrase “one-parameter subgroup” to describe such 
homomorphisms is a misnomer. The image of ~ should be called the subgroup.) 


(5.2) Proposition. 


(a) Let A be an arbitrary real or complex matrix, and let GL, denote GL,(R) or 
GL,(C), according to the case. The map g: R*——>GL, defined by y(t) = 
e" is a group homomorphism. 

(b) Conversely, let p: R‘’——> GL, be a homomorphism which is a differentiable 
function of the variable  € R*, and let A denote its derivative y'(O) at the 
origin. Then g(t) = e™ for all ¢. 


Proof. For any two real numbers r,s, the two matrices rA and sA commute. 
So Chapter 4 (7.13) tells us that 


(5.3) ee = ere, 


This shows that y(t) = e is a homomorphism. Conversely, let ¢ be a differen- 
tiable homomorphism R*——>GL,. The assumption that g is a homomorphism al- 
lows us to compute its derivative at any point. Namely, it tells us that p(t + Ar) = 
p(Ar)p(t) and g(t) = p(O)p(t). Thus 


p(t + At) — g(t) _ g(t) — a 


At At aE 


(5.4) 
Letting At—— 0, we find y'(1) = ¢'(O)p(t) = A(t). Therefore p(t) is a matrix- 
valued function which solves the differential equation 


dp 

— = Ag. 
(6:5) Ht Q 
The function e” is another solution, and both solutions take the value / at t = 0. It 
follows that y(t) = e [see Chapter 4 (8.14)]. o 


By the proposition we have just proved, the one-parameter subgroups all have 
the form y(t) = ¢. They are in bijective correspondence with n x 7 matrices. 
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(5.6) Figure. Some one-parameter subgroups of C* = GL,(C). 


Now suppose that a subgroup of G of GL» is given. We may ask for one- 
parameter subgroups of G, meaning homomorphisms ¢: R*-—>G, or. equivalently, 
homomorphisms to GL, whose image is in G. Since a one-parameter subgroup of 
GL, is determined by a matrix, this amounts to asking for the matrices A such that 
e” € G for all r. It turns out that linear groups of positive dimension always have 
one-parameter subgroups and that they are not hard to determine for a particular 
group. 


(5.7) Examples. 


(a) The usual parametrization of the unit circle in the complex plane is a one- 
parameter subgroup of U;: 


twwre" = cost + isint. 
(b) A related example is obtained for SO2 by setting 
Oe cos ¢ —sin t 
A= . Then e4 = | 
1 0 sint cost 
This is the standard parametrization of the rotation matrices. 


‘a examples (a) and (b). the image of the homomorphism is the whole subgroup. 


(ct Lee A be the 2 x 2 matrix unit ei... then since A’ = 0, all but two terms of the 
series expansion for the exponential vanish, and 


eA = J+ ept = |! ‘i 


In this case the exponential map defines an isomorphism from R* to its image, 
which is the group of triangular matrices with diagonal entries equal to 1. 


Section 5 One-Parameter Subgroups 285 


(d) The one-parameter subgroups of SU, are the conjugates of the group of diagonal 
special unitary matrices, the longitudes described in (2.13). a 


Instead of attempting to state a general theorem describing one-parameter sub- 
groups of a group, we will determine them for the orthogonal and special linear 
groups as examples of the methods used. We will need to know that the exponential 
function on matrices has an inverse function. 


(5.8) Proposition. The matrix exponential maps a small neighborhood S of 0 in 
R”“" homeomorphically to a neighborhood T of /. 


Proof. This proposition follows from the Inverse Function Theorem, which 
states that a differentiable function f: R‘——> R* has an inverse function at a point p 
if the Jacobian matrix (df,/dx,))(p) is invertible. We must check this for the matrix 
exponential at the zero matrix in R"*”. This is a notationally unpleasant but easy 
computation. Let us denote a variable matrix by X. The Jacobian matrix is the 
n> Xn* matrix whose entries are (0(e*)ag/OXi)\x-0. We use the fact that 
d/dt(e"*)|,=0 = A. It follows directly from the definition of the partial derivative that 
(0@*/8K,)lx=0 = (de"*/dt)|,-0 = e,. Therefore (0(e~)ag/OXy)|x-0 = O if a, B F i, j 
and (d(e*),/8X,)\x-0 = 1. The Jacobian matrix is the n? X n? identity matrix. 


oO 


We will now describe one-parameter subgroups of the orthogonal group O,. 
Here we are asking for the matrices A such that e“ is orthogonal for all r. 


(5.9) Lemma. If A is skew-symmetric, then e“ is orthogonal. Conversely, there is 
a neighborhood S' of 0 in R”*” such that if e“ is orthogonal and A € S’, then A is 
skew-symmetric. 


Proof. To avoid confusing the variable ¢ with the symbol for the transpose ma- 
trix, we denote the transpose of the matrix A by A* here. If A is skew-symmetric, 
then e4*) = e°4. The relation e4” = (e4)* is clear from the definition of the expo- 
nential, and e~4 = (e*)~' by Chapter 4 (8.10). Thus (e4)* = e@” = e4 = (e4)"'. 
This shows that e* is orthogonal. For the converse, we choose S’ small enough so 
that if A € S$’, then —A and A®* are in the neighborhood S of Proposition (5.8). Sup- 
pose that A © S’ and that e“ is orthogonal. Then e”) = e“, and by Proposition 
(5.8), this means that A is skew-symmetric. o 


(5.10) Corollary. The one-parameter subgroups of the orthogonal group O, are 
the homomorphisms t~~~» e, where A is a real skew-symmetric matrix. 


Proof. If A is skew-symmetric, tA is skew-symmetric for all t. So e is or- 
thogonal for all t, which means that e“ is a one-parameter subgroup of O,. Con- 
versely, suppose that e is orthogonal for all ¢. For sufficiently small €, €A is in 
the neighborhood S' of the lemma, and e“ is orthogonal. Therefore €A is skew- 
symmetric, and this implies that A is skew-symmetric too. o 


This corollary is illustrated by Example (5.7b). 
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Next, let us consider the special linear group SL,(R). 


(5.11) Proposition. Let A be a matrix whose trace is zero. Then e“ has determi- 
nant 1. Conversely, there is a neighborhood S’ of 0 in R”*” such that if A © S’ and 
det e* = 1, then trace A = 0. 


Proof. The first assertion follows from the pretty formula 
(5.12) eA = det e4, 


where trA denotes the trace of the matrix. This tormula follows in turn from the fact 
that if the eigenvalues of a complex matrix A are A;,...,An, then the eigenvalues of 
e“ are e*',...,e4". We leave the proof of this fact as an exercise. Using it, we find 


ee eT n= pM eel cas 


For the converse, we note that if |x| < 1, e* = 1 implies x = 0. We choose S’ 
small enough so that trA < 1 if A © S'. Then if det e4 = e"* = | andifA € S’, 
trA=0.0 


(5.13) Corollary. The one-parameter subgroups of the special linear group 
SL,(R) are the homomorphisms t~~»e, where A is a real n Xn matrix whose 
trace is zero. o 


The simplest one-parameter subgroup of SL2(R) is described in Example 
(5.7c). 


6. THE LIE ALGEBRA 


As always, we think of a linear group G as a subset of R””" or of C”*". The space of 
vectors tangent to G at the identity matrix 7, which we will describe in this section, 
is called the Lie algebra of the group. 

We will begin by reviewing the definition of tangent vector. If g(t) = 
(gi(t),..., x(t) is a differentiable path in R*, its velocity vector v = g(t) is tangent 
to the path at the point x = g(t). This is the basic observation from which the 
definition of tangent vector is derived. 

Suppose that we are given a subset S of R*. A vector v is said to be tangent to 
S at a point x if there is a differentiable path g(t) lying entirely in S, such that 
y(0) = x and ¢’(0) = v. 

If our subset S is the locus of zeros of one or more polynomial functions 
f(%1,..., Xx), it is called a real algebraic set: 


(6.1) Ne alee) le 
For example, the unit circle in R? is a real algebraic set because it is the locus of ze- 
ros of the polynomial f(x, x2) = x? + x. -— 1=0. 


The chain rule for differentiation provides a necessary condition for a vector to 
be tangent to a real algebraic set S. Let p(t) be a path in S, and let x = g(t) and 
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vu = g(t). Since the path is in 5, the functions f(g(r)) vanish identically; hence 
their derivatives also vanish identically: 


(6.2) qa “f(y oe Los. +L o Tain. 


() OF \ : 
where Vf = (2 f... 2f) is the gradient vector. 


(6.3) Corollary. Let S be a real algebraic set in R*, defined as the locus of zeros 
of one or more polynomial functions f(x). The tangent vectors to § at x are orthogo- 
nal to the gradients V f(x). o 


For instance, if S is the unit circle and x is the point (1,0), then the gradient 
vector V f(0) is (2,0). Corollary (6.3) tells us that tangent vectors at (1,0) have the 
form (0,c), that is, that they are vertical, which is as it should be. 

Computing tangent vectors by means of parametrized paths is clumsy because 
there are many paths with the same tangent. If we are interested only in the tangent 
vector, then we can throw out all of the information contained in a path except for 
the first-order term of its Taylor expansion. To do this systematically, we introduce a 
formal infinitesimal element €. This means that we work algebraically with the rule 


(6.4) e? = 0. 


Just as with complex numbers, where the rule is i? = —1, we can use this rule to 
define a multiplication on the vector space 


E = {a+ be|a,b € R} 


of formal linear combinations of (1, €) with real coefficients. The rule for multiplica- 
tion is 
(6.5) (a + be)(c + de) = ac + (bc + ade. 


In other words, we expand formally, using the relations ec = ce for all c © R and 
e’ = 0. As with complex numbers, addition is vector addition: 


(a + be) + (c + de) = (a +c) + (b+ de. 


The main difference between C and E is that E is not a field, because € has no multi- 
plicative inverse. [It is a ring (see Chapter 10).] 

Given a point x of R* and a vector v € R*, the sum x + ve is a vector with 
entries in E which we interpret intuitively as an infinitesimal change in x, in the di- 
rection of v. Notice that we can evaluate a polynomial f(x) = f(x,..., xx) atx + ve 
using Taylor’s expansion. Since €* = 0, the terms of degree 22 in € drop out, and 
we are left with an element of E: 


(6.6) f(x+ve) = f(x) + (20+. +2L Je = f(x) + (Vf(x)- ve. 
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Working with rule (6.4) amounts to ignoring the higher-order terms in €. Thus the 
dot product (V f(x) - c) represents the infinitesimal change in f which results when 
we make an infinitesimal change in x in the direction of v. 

Going back to a real algebraic set S defined by the polynomial equations 
f(x) = 0, let x be a point of S. Then f(x) = 0, so (6.6) tells us that 


(6.7) f(x + ve) = 0 if and only if (Vf(x) - v) = 0, 


which is the same as the condition we obtained in Corollary (6.3). This suggests the 
following definition: Let S be a real algebraic set, defined by the polynomial equa- 
tions f(x) = 0. A vector v is called an infinitesimal tangent to S at x if 


(6.8) f(x + ve) = 0. 


(6.9) Corollary. Let x be a point of a real algebraic set S. Every tangent to S at x 
is an infinitesimal tangent. o 


Notice that if we fix x € S, the equations (V f(x) - v) = 0 are linear and ho- 
mogeneous in v. So the infinitesimal tangent vectors to S at x form a subspace of the 
space of all vectors. 

Actually, our terminology is slightly ambiguous. The definition of an 
infinitesimal tangent depends on the equations f. not only on the set S. We must have 
particular equations in mind when speaking of infinitesimal tangents. 

For sets § which are sufficiently smooth, the converse of (6.9) is also true: 
Every infinitesimal tangent is a tangent vector. When this is the case, we can com- 
pate the space of tangent vectors at a point x € S by solving the linear equations 
(V f(s} - ¢) = 0 for v. which is relatively easy. However, this converse will not be 
irue at “singular points” of the set S, or if the defining equations for S are chosen 
poorly. for example, let S denote the union of the two coordinate axes in R’. This is 
a teal algebraic set defined by the single equation xx. = Q. It is clear that at the 
origin a tangent vector must be parallel to one of the two axes. On the other hand. 
Vf = (a,x), which is zero when x, = x. = 0. Therefore every vector is an 
infinitesimal tangent to S at the origin. 

This completes our general discussion of tangent vectors. We will now apply 
this discussion to the case that the set S is one of our linear groups G in R”™” or 
C”°". The tangent vectors to G will be n°-dimensional vectors, and we will repre- 
sent them by matrices too. As we said earlier, the vectors tangent to G at the iden- 
tity / form the Lie algebra of the group. 

The first thing to notice is that every one-parameter subgroup e” of our linear 
group G is a parametrized path. We already know that its velocity vector 
(de'/dt);-v is A. So A represents a tangent vector to G at the identity—it is in the 
Lie algebra. For example, the unitary group U, is the unit circle in the complex 
plane, and e" is a one-parameter subgroup of U,. The velocity vector of this one- 


parameter subgroup at ¢ = 0 is the vector i, which is indeed a tangent vector to the 
unit circle at the point |. 


Section 6 The Lie Algebra 289 


A matrix group G which is a real algebraic set in R"*" is called a real alge. 
braic group. The classical linear groups such as SL,(I8) and O, are real algebraic. be- 
cause their defining equations are polynomial equations in the matrix entries. For ex- 
ample, the group SZ.(R) is defined by the single polynomial equation det P = 1: 


X\( Xm Max — 1 = 0, 


The orthogonal group O; is defined by nine polynomials f,, expressing the condition 
P'‘P = 1: 


0 ifi# j 


Say = XX yA XyXyt xix — 6 = 0, by = ee ss 
1 ifi=j 


Complex groups such as the unitary groups can also be made into real algebraic 
groups in R°"*" by separating the matrix entries into their real and imaginary parts. 

It is a fact that for every infinitesimal tangent A to a real algebraic group G at 
the identity, e is a one-parameter subgroup of G. In other words, there is a one- 
parameter subgroup leading out from the identity in an arbitrary tangent direction. 
This is quite remarkable for a nonabelian group, but it is true with essentially no re- 
striction. Unfortunately, though this fact is rather easy to check for a particular 
group. it is fairly hard to give a general proof. Therefore we will content ourselves 
with verifying particular cases. 

Having an infinitesimal element available. we may work with matrices whose 
entries are in E. Such a matrix will have the form A + Be, where A,B are real ma- 
trices. Intuitively, A + Be represents an infinitesimal change in A in the direction of 
the matrix B. The rule for multiplying two such matrices is the same as (6.5): 


(6.10) (A + Be)(C + De) = AC + (AD + BC. 


The product BeDe is zero because (b,¢€)(dyve) = O for all values of the indices. 
Let G be a real algebraic group. To determine its infinitesimal tangent vectors 
at the identity, we must determine the matrices A such that 


(6.11) I + Aé, 


which represents an infinitesimal change in / in the direction of the matrix A, 
satisfies the equations defining G. This is the definition (6.8) of an infinitesimal! tan- 
gent. 

Let us make this computation for the special linear group SL,(R). The defining 
equation for this group is det P = |. So A is an infinitesimal tangent vector if 
det (i + Ae) = 1. To describe this condition, we must calculate the change in the 
determinant when we make an infinitesimal change in /. The formula is nice: 


(6.12) det(1 + Ae) = 1 + (trace Ade. 


The proof of this formula is left as an exercise. Using it, we find that A is an 
infinitesimal tangent vector if and only if traceA = 0. 
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(6.13) Proposition. The following conditions on a rea] n X n matrix A are equiva- 
lent: 


(i) traceA = 0; 

(ii) e4 is a one-parameter subgroup of SL,(R); 
(iii) A is in the Lie algebra of SL,(R); 
(iv) A is an infinitesimal tangent to SL,(R) at J. 


Proof. Proposition (5.11) tells us that (i) > (ii). Since A is tangent to the path 
e'4 at 1 = ©, (ii) > (iii). The implication (iii) > (iv) is (6.9), and (iv) > (i) fol- 
lows from (6.12). o 


There is a general principle at work here. We have three sets of matrices A: 
those such that e“ is a one-parameter subgroup of G, those which are in the Lie al- 
gebra, and those which are infinitesimal tangents. Let us denote these three sets by 
Exp(G), Lie(G), and Inf(G). They are related by the following inclusions: 


(6.14) Exp(G) C Lie(G) C Inf(G). 


The first inclusion is true because A is the tangent vector to e att = 0, and the sec- 
ond holds because every tangent vector is an infinitesimal tangent. If Exp(G) = 
Inf(G), then these two sets are also equal to Lie(G). Since the computations of 
Exp(G) and Inf(G) are easy, this gives us a practical way of determining the Lie al- 
gebra. A general theorem exists which implies that Exp(G) = Inf(G) for every real 
algebraic group, provided that its defining equations are chosen properly. However, 
it isn’t worthwhile proving the general theorem here. 

We will now make the computation for the orthogonal group O,. The defining 
equation for O, is the matrix equation P'P = J. In order for A to be an infinitesimal 
tangent at the identity, it must satisfy the relation 


(6.15) (1 + Ae)'(I + Ae) = 7. 


The left side of this relation expands to / + (A' + A)e, so the condition that / + Ae 
be orthogonal is A‘ + A = O, or A is skew-symmetric. This agrees with the condition 
(5.10) for e to be a one-parameter subgroup of On. 


(6.16) Proposition. The following conditions on a real n X n matrix A are equiva- 
lent: 
(i) A is skew-symmetric; 
(ii) e is a one-parameter subgroup of On; 
(iii) A is in the Lie algebra of O,; 
(iv) A is an infinitesimal tangent to O, at J. a 


The Lie algebra of a linear group has an additional structure, an operation 
called the Lie bracket. The Lie bracket is the law of composition defined by the rule 


(6.17) [A,B] = AB — BA. 
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This law of composition is not associative. It does, however, satisfy an identity 
called the Jacobi identity, 


(6.18) [A, [B,C] + [B,[c,A]] + [c,[A, B]] = 0, 


which is a substitute for the associative law. 

To show that the bracket is a law of composition on the Lie algebra, we must 
check that if A,B are in Lie(G), then [A, B] is also in Lie(G). This can be done easily 
for any particular group. For the special linear group, the required verification is 
that if A,B have trace zero, then AB — BA also has trace zero. This is true, because 
trace AB = trace BA. Or let G = On, so that the Lie algebra is the space of skew- 
symmetric matrices. We must verify that if A,B are skew, then [A, B] is skew too: 


[A, B]' = (AB — BA)' = B‘A' — A‘B‘ = BA — AB = -[A,B], 
as required. 
The bracket operation is important because it is the infinitesimal version of the 
commutator PQP"'Q™'. To see why this is so, we must work with two infinitesimals 


€,6, using the rules e€* = 5* = 0 and €6 = de. Note that the inverse of the matrix 
I + A€is 1 — Ae. SoifP = 1 + Ae and Q = / + Bé, the commutator expands to 


(6.19) (1 + Ae)(I + Bd)(I — Ae)(I — BS) = I + (AB — BA)€6. 


Intuitively, the bracket is in the Lie algebra because the product of two elements in 
G, even infinitesimal ones, is in G, and therefore the commutator of two elements is 
also in G. 

Using the bracket operation, we can also define the concept of Lie algebra ab- 
stractly. 


(6.20) Definition. A Lie algebra V over a field F is a vector space together with a 
law of compositicn 

Ve ee 

vD,wnw (ey w] 


called the bracket, having these properties: 


(i) bilinearity: [v: + v2, w] = [v1,w] + [v2,w], [cv, w] = c[v, w], 
[v, wi + wo] = [v, wi] + [v, wo], [v, ew] = cv, w], 
(ii) skew symmetry: [v, v] = 0, 
(iii) Jacobi identity: [u,[v,w]] + [v,[w, uv] + [w, [u, o]] = 0, 
for all u,v, w © Vand allc € F. 
The importance of Lie algebras comes from the fact that, being vector spaces, 


they are much easier to work with than the linear groups themselves, and at the same 
time the classical groups are nearly determined by their Lie algebras. In other 
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words, the infinitesimal structure of the group at the identity element is almost 
enough to determine the group. 


7. TRANSLATION IN A GROUP 


We will use one more notion from topology in this section—the definition of mani- 
fold in R*. This definition is reviewed in the appendix [Definition (3.12)]. Do not be 
discouraged if you are not familiar with the concept of manifold. You can learn what 
is necessary without much trouble as we go along. 
Let P be a tixed element of a matrix group G. We know that left multiplication 
by P is a bijective map from G to itself: 
Mp 


(7.1) G—> 


Xow Px, 


G 


because it has the inverse function m, 1. The maps mp and mp : are continuous, be- 
cause matrix multiplication is continuous. Thus mp is a homeomorphism from G to 
itself (not a homomorphism). It is also called /eft translation by P, in analogy with 
translation in the plane, which is left translation in the additive group R?”. 

The important property of a group which is implied by the existence of these 
maps is homogeneity. Multiplication by P is a homeomorphism which carries the 
identity element / to P. So the topological structure of the group G is the same near 
{as it is near P, and since P is arbitrary, it is the same in the neighborhoods of any 
two points of the group. This is analogous to the fact that the plane looks the same 
at any two points. 

Left multiplication in SU, happens to be defined by an orthogonal change of 
the coordinates (x), x2, .x:, x4), So it is a rigid motion of the 3-sphere. But multiplica- 
tion by a matrix needn't be a rigid motion, so the sense in which any group is homo- 
geneous 1s weaker. For example. let G be the group of real invertible diagonal 2 x 2 
matrices, and let us identify the elements of G with the points (a.d) in the plane. 
which are not on the coordinate axes. Multiplication by the matrix 


2 0 
2) | p=|o 


distorts the group G, but it does so continuously. 


(7.3) Figure. Left multiplication in a group. 
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Now the only geometrically reasonable subsets of [R‘ which have this homo- 
geneity property are manifolds. A manifold M of dimension d is a subset which is 
locally homeomorphic to RY’ at any one of its points, meaning that every point 
p © M has a neighborhood homeomorphic to an open set in R [see Appendix 
(3.12)]. It isn’t surprising that the classical groups. being homogeneous, are mani- 
folds, though there are subgroups of GL, which aren't. The group GL,(Q) of invert- 
ible matrices with rational coefficients, for example, is a rather ugly set when viewed 
geometrically, though it is an interesting group. The following theorem gives a sat- 
isfactory answer to the question of which linear groups are manifolds: 


(7.4) Theorem. Let G be a subgroup of GL,(IR) which is a closed set in R’*”. 
Then G is a manifold. 


Giving the proof of this theorem here would take us too far afield. Instead, we will 
illustrate the theorem by showing that the orthogonal groups O, are manifolds. The 
proofs for other classical groups are similar. 


(7.5) Proposition. The orthogonal group O, is a manifold of dimension 
I 
stn — bye 


Proof. Let us denote the group O, by G and denote its Lie algebra, the space 
of skew-symmetric matrices, by L. Proposition (5.9) tells us that for matrices A near 
0. A € L if and only if e* € G. Also, the exponential is a homeomorphism from a 
neighborhood of 0 in X”’” to a neighborhood of /. Putting these two facts together, 
we find that the exponential defines a homeomorphism from a neighborhood of 0 in 
L to a neighborhood of / in G. Since L is a vector space of dimension 3n(n — 1), it 
is a manifold. This shows that the condition of being a manifold is satisfied by the 
orthogonal group at the identity. On the other hand, we saw above that any two 
points in G have homeomorphic neighborhoods. Therefore G is a manifold, as 
claimed. o 


/ 


/ 


a exponential aN 
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(7.6) Figure. 
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Here is another application of the principle of homogeneity: 


(7.7) Proposition. Let G be a path-connected matrix group, and let H C G be a 
subgroup which contains a nonempty open subset of G. Then H = G. 


Proof. By hypothesis, H contains a nonempty open subset U of G. Since left 
multiplication by g © G is a homeomorphism, gU is also open in G. Each translate 
gU is contained in a single coset of H, namely in gH. Since the translates of U cover 
G, they cover each coset. In this way, each coset is a union of open subsets of G, and 
hence it is open itself. So G is partitioned into open subsets—the cosets of H. Now 
a path-connected set is not a disjoint union of proper open subsets [see Appendix, 
Proposition (3.11)]. Thus there can be only one coset, and H = G. o 


We will now apply this proposition to determine the normal subgroups of SU2. 


(7.8) Theorem. The only proper normal subgroup of SU? is its center {+ /}. 


Since there is a surjective map g: SU.—— SO; whose kernel is {+/}, the rota- 
tion group is isomorphic to a quotient group of SU> [Chapter 2 (10.9)]: 


(7.9) SO; ~ SU2/{+1}. 


(7.10) Corollary. SO, is a simple group; that is, it has no proper normal sub- 
group. 

Proof. The inverse image of a normal subgroup in SO; is a normal subgroup 
of SU, which contains {+/} [Chapter 2 (7.4)]. Theorem (7.8) tells us that there are 
no proper ones. o 


Proof of Theorem (7.8). It is enough to show that if N is a normal subgroup 


of SU2 which is not contained in the center {+/}, then N is the whole group. Now 
since N is normal, it is a union of conjugacy classes [Chapter 6 (2.5)]. And we have 


ay 


SU2 SU, 
(7.11) Figure. 
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seen that the conjugacy classes are the latitudes, the 2-spheres (2.8). By assumption, 
N contains a matrix P # +/, so it contains the whole conjugacy class C = Cp, 
which is a 2-sphere. Intuitively, this set looks big enough to generate SU>. For it has 
dimension 2 and is not a subgroup. So the set S of all products P™'9 with P,Q € C is 
larger than C. Therefore S ought to have dimension 3, which is the dimension of SU> 
itself, so it ought to contain an open set in the group. 

To make this intuitive reasoning precise, we choose a nonconstant continuous 
map from the unit interval [0,1] to C such that Po = P and P, # P. We form the 
path 


(7.12) QO, = P"'P,;. 


Then Qo = /, and Q, # /, so this path leads out from /. Since P and P, are in N, Q, 
is in N for every t € [0,1]. We don’t need to know anything else about the path Q,. 

Let f(t) be the function trace Q,. This is a continuous function on the interval 
[0,1]. Note that f(0) = 2, while f(1) = 7 < 2 because Q, # /. By continuity, all 
values between 7 and 2 are taken on by f in the interval. 

Since N is normal, it contains the conjugacy class of Q; for every t. So since 
trace Q, takes on all values near 2, Proposition (2.9) tells us that N contains all ma- 
trices in SU, whose trace is sufficiently near to 2, and this includes all matrices 
sufficiently near to J. So N contains an open neighborhood of / in SUx. Now SU2, 
being a sphere, is path-connected, so Proposition (7.7) completes the proof. 


We can also apply translation in a group G to tangent vectors. If A is a tangent 
vector at the identity and if P € G is arbitrary, then PA is a tangent vector to G at 
the point P. Intuitively, this is because P(/ + Ae) = P + PAe is the product of ele- 
ments in G, so it lies in G itself. As always, this heuristic is easy to check for.a par- 
ticular group. We fix A, and associate the tangent vector PA to the element P of G. In 
this way we obtain what is called a tangent vector field on the group G. Since A is 
nonzero and P is invertible, this vector field does not vanish at any point. Now just 
the existence of a tangent vector field which is nowhere zero puts strong restrictions 
on the space G. For example, it is a theorem of topology that any vector field on the 
2-sphere must vanish at some point. That is why the 2-sphere has no group struc- 
ture. But the 3-sphere, being a group, has tangent vector fields which are nowhere 


Zero. 


8. SIMPLE GROUPS 


Recall that a group G is called simple if it is not the trivial group and if it contains 
no proper normal subgroup (Chapter 6, Section 2). So far, we have seen two non- 
abelian simple groups: the icosahedral group / ~ As [Chapter 6 (2.3)] and the rota- 
tion group SO; (7.10). This section discusses the classification of simple groups. We 
will omit most proofs. 

Simple groups are important for two reasons. First of all, if a group G has a 
proper normal subgroup N, then the structure of G is partly described when we 
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know the structure of N and of the quotient group G/N. If N or G/N has a normal 
subgroup, we can further decompose the structure of these groups. In this way we 
may hope to describe a particular finite group G, by building it up inductively from 
simple groups. 

Second, though the condition of being simple is a very strong restriction, sim- 
ple groups often appear. The classical linear groups are almost simple. For example, 
we saw in the last section that SU, has center {+/} and that SU,/{+1} ~ SO; is a 
simple group. The other classical groups have similar properties. 

In order to focus attention, we will restrict our discussion here to the complex 
groups. We will use the symbol Z to denote the center of any group. The following 
theorem would take too much time to prove here, but we will illustrate it in the spe- 
cial case of SL2(C). 


(8.1) Theorem. 


(a) The center Z of the special linear group SL,(C) is a cyclic group, generated by 
the matrix ¢/ where ¢ = e?”/". The quotient group SL,(C)/Z is simple if 
nee 2. 

(b) The center Z of the complex special orthogonal group SO,(C) is {+/} if n is 
even, and is the trivial group {/} if n is odd. The group SO,/Z is simple if 
n= omit = 5: 

(c) The center Z of the symplectic group SP2,(C) is {£7}, and SP2,(C)/Z is simple 
ifn =l.o 


The group SL,(C)/Z is called the projective group and is denoted by PSL,(C): 
(8.2) PSL,(C) = SL,(C)/Z, where Z = {f1|£" = 1}. 


To illustrate Theorem (8.1), we will prove that PSL.(C) = SL,(C)/{+/} is 
simple. In fact, we will show that PSL2(F) is a simple group for almost all fields F. 


(8.3) Theorem. Let F be a field which is not of characteristic 2 and which con- 
tains at least seven elements. Then the only proper normal subgroup of SL2(F) is the 
subgroup {+/}. Thus PSL2(F) = SL2(F)/{+1} is a simple group. 

Since the center of SL2(F) is a normal subgroup, it follows from the theorem 
that it is the group {+/}. 


(8.4) Corollary. There are infinitely many nonabelian finite simple groups. 


Proof of Theorem (&.3). The proof is algebraic, but it is closely related to the 
geometric proof given for the analogous assertion for SU2 in the last section. Our 
procedure is to conjugate and multiply until the group is generated. To simplify nota- 
tion, we will denote SL2(F) by SL2. Let N be a normal subgroup of SL2 which con- 
tains a matrix A # +/. We must show that N = SL2. Since one possibility is that N 
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is the normal subgroup generated by A and its conjugates, we must show that the 
conjugates of this one matrix suffice to generate the whole group. 

The first step in our proof will be to show that NV contains a triangular matrix 
different from +/. Nov. if our given matrix A has eigenvalues in the field F, then it 
will be conjugate to a triangular matrix. But since we want to handle arbitrary fields, 
we can not make this step so easily. Though easy for the complex numbers, this step 
is the hardest part of the proof for a general field. 


(8.5) Lemma. JN contains a triangular matrix A # +/. 


Preef. kat A = t | be a matrix in N which is different from +/. If 


c = 0, then A is the required matrix. 
Suppose that ¢ # 0. In this case, we will construct a triangular matrix out of A 
and its conjugates. We first compute the conjugate 


Pale a ]-f 2.J-« 


Since c # 0. we may choose x so thata + xc = 0. The matrix A‘ is in N, so N con- 
tains a matrix whose upper left entry is zero. We replace A by this matrix, so that 1 


b _ 
has the form A = =— Unfortunately the zero is in the wrong piace. 
Note that since det A = 1, bc = —1 in our new matrix A. We now compute the 


commutator P~'A™'PA with a diagonal matrix: 


— ae [2 | i as ~— 
P'A'UPA = te. a . 


This matrix, which is in our normal subgroup N, is as required unless it is +/. If so, 
then u° = +1 and u* = |. But we are free to form the matrix P with an arbitrary 
element u in F *. We will show [Chapter 11 (1.8)]} that the polynomial x* — 1 has at 
most four roots in any field. So there are at most four elements u € F with u* ~ |. 
Our hypothesis is that F contains at least five elements. So we can choose u © F* 
with u* # 1. Then P™' A“! PA is the required matrix. 5 


(8.6) Lemma. JA contains a matrix of the form | Al with u # 0. 


Proof. By the previous lemma, N contains a triangular matrix A = 


P | %# +1. Ifd # a, letA’ = is 3 | be its conjugate by the matrix |! 1 


Then b’ = b + d — a. Since det A = ad = 1, the product 


em lt 2 )-[ 4 
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is the required matrix. If a = d, then a = +1 because det A = 1, and it follows that 
b # 0. In this case, one of the two matrices A or A? is as required. o 


(8.7) Lemma. Let F be a field. The conjugacy class in SL2 of the matrix k ‘l 


a’u 


l 


fo ee i aoe 


(8.8) Lemma. Let F be a field of characteristic # 2. The additive group F* of the 
field is generated by the squares of elements of F. 


contains the matrices ie | and [ |. for all a # 0. 


Proof. We show that every element x € F can be written in the form 
a’ — b? = (a + b\(a — b), with a,b € F. To do this, we solve the system of 
linear equations a + b = 1, a — b = x. This is where the assumption that the char- 
acteristic of F is not 2 is used. In characteristic 2, these equations need not have a 
solution. o 


(8.9) Lemma. Let F be a field of characteristic # 2. If a normal subgroup N of 


SL2(F) contains a matrix with u # Q, then it contains all such matrices. 


u 
1 
x 
I 
want to show that S = F*. Lemma (8.7) shows that if u € S, then a*u € S for all 
a € F. Since the squares generate F*, the set of elements {a’u | a € F} generates 
the additive subgroup F* u of F*, and this subgroup is equal to F* because u is in- 
vertible. Thus S = F*, as required. o 


Proof, The set of x suehitiat ' Nis a subgroup of F*, call it S. We 


(8.10) Lemma. For every field F, the group SL2(F) is generated by the elemen- 


u 
and 


ry matrices 
Cary, ie 


Proof. We perform row reduction on a matrix A = K A € SL,(F), using 
only the matrices of this form. We start work on the first column, reducing it to e. 
We eliminate the case c = 0 by adding the first row to the second if necessary. Then 
we add a multiple of the second row to the first to change a to 1. Finally, we clear 
Neti 

. Then 
ad 
d’ = det A’ = det A = 1, and we can clear out the entry b’, ending up with the 
identity matrix. Since we needed four operations or less to reduce to the identity, A 
is a product of at most four of these elementary matrices. o 


out the entry c. At this point, the matrix has the form A’ = 
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The proof of Theorem (8.3) is completed by combining Lemmas (8.6), (8.7). 
(8.9), and (8.10). o 


A famous theorem of Cartan asserts that the list (8.1) of simple groups is al- 
most complete. Of course there are other simple groups; for instance, we have just 
proved that PSLAF) is simple tor most fields F. But if we restrict ourselves to ¢om- 
plex algebraic groups, the list of simple groups becomes very short. 

A subgroup G of GL,(C) is called a complex algebraic group if it is the set of 
solutions of a finite system of polynomial equations in the matrix entries. This is 
analogous to the concept of a real algebraic group introduced in Section 6. It will 
not be apparent why the property of being defined by polynomial equations is a rea- 
sonable one. but one thing is easy to see: Except for the unitary groups U,, and SU,,, 
all the complex classical groups are complex algebraic groups. 


(8.11) Theorem. 


(a) The groups PSL,(C) = SLA(C)/Z, SO(C)/Z, and SP2,(C)/Z are path- 
connected complex algebraic groups. 

(b) In addition to the isomorphism classes of these groups, there are exactly five 
isomorphism classes of simple. path-connected complex algebraic groups. 
called the exceptional groups. 


Theorem (8.11) is too hard to prove here. It is based on a classification of the 
corresponding Lie algebras. What we should learn is that there are not many simple 
algebraic groups. This ought to be reassuring after the last chapter, where structures 
on a vector space were introduced one after the other. each with its own group of 
symmetries. There seemed to be no end. Now we see that we actually ran across 
most of the possible symmetry types. at least those associated to simple algebraic 
groups. It is no accident that these structures are important. o 


A large project, the classification of the finite simple groups, was completed in 
1980. The finite simple groups we have seen are the groups of prime order, the 
icosahedral group / ~ As [Chapter 6 (2.3)], and the groups PSL2(F) where F is a 
finite field (8.3), but there are many more. The alternating groups A, are simple for 
allerezeed: 

Linear groups play a dominant role in the classification of the finite simple 
groups as well as of the complex algebraic groups. Each of the forms (8.11) leads to 
a whole series of finite simple groups when finite fields are substituted for the com- 
plex field. Also, some finite simple groups are analogous to the unitary groups. All 
of these finite linear groups are said to be of Lie type. 

According to Theorem (8.3), PSL2(F 7) is a finite simple group; its order is 168. 
This is the second smallest simple group; As is the smallest. The orders of the 
smallest nonabelian simple groups are 


(8.12) 60, 168, 360, 504, 660, 1092, 2448. 
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For each of these seven integers N, there is a single isomorphism class of simple 
groups of order v, and it is represented by PSL2(F) for a suitable finite field F. [The 
alternating group As happens to be isomorphic to PSL2(Fs).] 

In addition to the groups of prime order, the. alternating groups, and the 
groups of Lie type, there are exactly 26 finite simple groups called the sporadic 
groups. The smallest sporadic group is the Matthieu group M,,, whose order is 7920. 
The largest is called the Monster; its order is roughly 10°°. So the finite simple 
groups form a list which, though longer. is somewhat analogous to the list (8.11) of 
simple algebraic groups. 


It seems unfair to crow about the successes of a theory 
and to sweep all its failures under the rug. 


Richard Brauer 


EXERCISES 
I. The Classical Linear Groups 


1. (a) Find a subgroup of GL2(R) which is isomorphic to C*. 

(b) Prove that for every n, GL,(C) is isomorphic to a subgroup of GL2,(R). 

2. Show that SO.(C) is not a bounded set in C*. 

. Prove that SP(R) = SL(R), but that SP,(R) # SL.(R). 

4. According to Sylvester's Law, every 2 X 2 real symmetric matrix is congruent to exactly 
one of six standard types. List them. If we consider the operation of GL.(R) on 2 x 2 
matrices by P, A~~~~» PAP', then Sylvester’s Law asserts that the symmetric matrices form 
six orbits. We may view the symmetric matrices as points in R°, letting (x,y,z) corre- 


1) 


spond to the matrix . i . Find the decomposition of R* into orbits explicitly, and 


make a clear drawing showing the resulting geometric configuration. 

5. A matrix P is orthogonal if and only if its columns form an orthonormal basis. Describe 
the properties that the columns of a matrix must have in order for it to be in the Lorentz 
group O03). 

6. Prove that there 1s no continuous isomorphism from the orthogonal group O, to the 
Lorentz group O3). 


~J 


. Describe by equations the group O,.,, and show that it has four connected components. 

8. Describe the orbits for the operation of SL2(IR) on the space of real symmetric matrices 
by P, A> PAPt, 

9. Let F be a field whose characteristic is not 2. Describe the orbits for the action 

P, A> PAP’ of GL2(F) on the space of 2 X 2 symmetric matrices with coefficients in F. 


10. Let k = F,. Classify the orbits of GL,(F) for the action on the space of symmetric n X n 
matrices by finding representatives for each congruence class. 
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B Pe 5 
. where B = B' and A is invertible. 


11. Prove that the call matrices are symplectic, if the blocks are n X n: 
I 


pL ih 


{ 


12. Prove that the symplectic group SP2,(R) operates transitively on R2”, 
*13. Prove that SP»,(R) is path-connected, and conclude that every symplectic matrix has de- 


terminant 1. 


2. The Special Unitary Group SU; 


an 


Let P,Q be elements of SU,, represented by the real vector (x, , 2,3, X41), (¥1.¥2,¥3, Ya). 
Compute the real vector which corresponds to the product PQ. 


. Prove that the subgroup SO, of SU is conjugate to the subgroup T of diagonal vectors. 
. Prove that SU> is path-connected. Do the same for SO3. 


Prove that U2 is homeomorphic to the product S? x S$'. 


. Let G be the group of matrices of the form vt where x,v € Rand x > 0. Deter- 


| 
mine the conjugacy classes in G, and draw them in the (x, y)-plane. 


. (a) Prove that every element P (2.4) of SU can be written as a product: P ='DRD’', 


where D,D’ © T (2.13), and R © SO, is a rotation through an angle 6 with 
0580 <7/2. 

(b) Assume that the matrix entries a, b of P are not zero. Prove that this representation is 
unique, except that the pair D, D’ can be replaced by —D, -D’. 

(c) Describe the doubie cosets TPT, P € SU2. Prove that if the entries a, b of P are not 
zero, then the double coset is homeomorphic to a torus, and describe the remaining 
double cosets. 


3. The Orthogonal Representation of SU, 


. Compute the stabilizer H of the matrix \ : for the action of conjugation by SU. and 


describe g(?) for P € H. 
Prove that every great circle in SU2 is a coset of one of the longitudes (2.14). 


. Find a subset of R* which is homeomorphic to the space S X © of (3.4). 


Derive a formula for (A, A) in terms of the determinant of A. 


. The rotation group SO, may be mapped to the 2-sphere by sending a rotation matrix to 


its first column. Describe the fibres of this map. 
Extend the map o defined in this section to a homomorphism ®: U,-——> SO,, and de- 
scribe the kernel of ®. 


. Prove by direct computation that the matrix (3.11) is in SOs. 


Describe the conjugacy classes in SO; carefully, relating them to the conjugacy classes of 
SU2. 


. Prove that the operation of SU: on any conjugacy class other than {1}, {-1} is by rota- 


tions of the sphere. 


. Find a bijective correspondence between elements of SO; and pairs (p.v) consisting of a 


point p on the unit 2-sphere S and a unit tangent vector v to S at p. 
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11. Prove Proposition (3.20). 
*12. (a) Calculate left multiplication by a fixed matrix P in SU2 explicitly in terms of the co- 
ordinates x;,X2,X3,X4. Prove that it is multiplication by a 4 xX 4 orthogonal matrix 
Q, hence that it is a rigid motion of the unit 3-sphere S’. 

(b) Prove that Q is orthogonal by a method similar to that used in describing the orthogo- 
nal representation: Express dot product of the vectors (x, .%2,%3,X4), (11 .X2" x3" x4’) 
corresponding to two matrices P,P’ € SU> in terms of matrix operations. 

(c) Determine the matrix which describes the operation of conjugation by a fixed matrix 
Pon SU. 

*13. (a) Let H; be the subgroup of SO; of rotations about the x;-axis, i = 1, 2, 3. Prove that 
every element of SO; can be written as a product ABA’, where A,A’ © H, and 
B € Hy). Prove that this representation is unique unless B = /. 

(b) Describe the double cosets HiQH; geometrically. 

*14, Let H; be the subgroup of SO; of rotations about the x;-axis. Prove that every element 

Q € SO; can be written in the form A; A2A3, with A; € H;. 


4. The Special Linear Group SL,(R) 


1. Let G = SLC). Use the operation of G on rays {rx} | r € R, r > 0} in C* to prove that 
G is homeomorphic to the product SU. H, where H is the stabilizer of the ray {re,}, and 
describe H explicitly. 

2. (a) Prove that the rule P, A~~» PAP* defines an operation of SL2(C) on the space W of 

all hermitian matrices. 
(b) Prove that the function (A, A’) = det(A + A’) — det A — det A’ is a bilinear form on 
W, whose signature is (3, 1). 
(c) Use (a) and (b) to define a homomorphism g: SL2(C)——> O;,,, whose kernel is 
eed a's 
*(d) Prove that the image of ¢ is the connected component of the identity in O;,. 
3. Let P be a matrix in SO3(C). 
(a) Prove that | is an eigenvalue of P. 
(b) Let X;,X2 be eigenvectors, for P, with eigenvalues A,,A2. Prove that x;'xX. = 0, un- 
less Ay = Az! 
(c) Prove that if X is an eigenvector with eigenvalue | and if P # /, then x'x # 0. 

4. Let G = SO,(C). 

(a) Prove that left multiplication by G is a transitive operation on the set of vectors X 
such that X'x = 1. 

(b) Determine the stabilizer of e, for left multiplication by G. 

(c) Prove that G is path-connected. 


5. One-Parameter Subgroups 


1. Determine the differentiable homomorphisms from C* to SL,(C). 

2. Describe all one-parameter subgroups of C%. 

3. Describe by equations the images of all one-parameter subgroups of the group of real 
2 X 2 diagonal matrices, and make a neat drawing showing them. 

4. Let g: R" —~+GL,(R) be a one-parameter subgroup. Prove that ker g is either trivial, 
or the whole group, or else it is infinite cyclic. 


Chapter 8 Exercises 303 


5. Find the conditions on a matrix A so that e' is a one-parameter subgroup of the special 
unitary group SU,, and compute the dimension of that group. 


6. Let G be the group of real matrices of the form os A x > 0: 


(a) Determine the matrices A such that e” is a one-parameter subgroup of G. 
(b) Compute e% explicitly for the matrices determined in (a). 
(c) Make a drawing showing the one-parameter subgroups in the (x, y)-plane. 

7. Prove that the images of the one-parameter subgroups of SU2 are the conjugates of T (see 
Section 3). Use this to give an alternative proof of the fact that these conjugates are the 
longitudes. 

8. Determine the one-parameter subgroups of U2. 

9. Let y(t) = e’4 be a one-parameter subgroup of G. Prove that the cosets of im y are ma- 
trix solutions of the differential equation dx/dt = AX. 

10. Can a one-parameter subgroup of GL,(R) cross itself? 
*11. Determine the differentiable homomorphisms from SO, to GL,(R). 


6. The Lie Algebra 


al 


. Compute (A + Be)', assuming that A is invertible. ~, 
. Compute the infinitesimal tangent vectors to the plane curve y*"= x? at the point (1,1) 
and at the point (0,0). 
. (a) Sketch the curve C: x? = x3 — x)’. 
(b) Prove that this locus is a manifold of dimension | if the origin is deleted. 
(c) Determine the tangent vectors and the infinitesimal tangents to C at the origin. 
4. Let S be a real algebraic set defined by one equation f = 0. 
(a) Show that the equation f? = 0 defines the same locus S. 
(b) Show that V(f?) vanishes at every point x of S, hence that every vector is an 
infinitesimal tangent at x, when the defining equation is taken to be f? = 0. 
. Show that the set defined by xy = 1 is a subgroup of the group of diagonal matrices 
ne 


iS) 


tod 


vay 


, and compute its Lie algebra. 


— 


. Determine the Lie algebra of the unitary group. 
. (a) Prove the formula det(7 + Ae) = 1 + trace Ae. 
(b) Let A be an invertible matrix. Compute det(A + Be). 
8. (a) Show that O, operates by conjugation on its Lie algebra. 
(b) Show that the operation in (a) is compatible with the bilinear form (4,8) = 
4 trace AB. 
(c) Use the operation in (a) to define a homomorphism O,——> O2, and describe this ho- 


momorphism explicitly. 
9. Compute the Lie algebra of the following: (a) Un; (b) SUn; (€) Os.1; (d) SOp(C). In each 
case, show that e'4 is a one-parameter subgroup if and only if / + Ae lies in the group. 


“i 


C|D 
11. (a) Show that R? becomes a Lie algebra if the bracket is defined to be the cross product 
(x, Y] =X XY = (mys — yoxs, Xsy1 — Yix3,X1y2 — X21). 
(b) Show that this Lie algebra is isomorphic to the Lie algebra of SO3. 


*10. Determine the Lie algebra of G = SP2,(R), using block form M = E B i 
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12. Classify all complex Lie algebras of dimension <3. 
*13. The adjoint representation of a linear group G is the representation by conjugation on its 
Lie algebra: G X L——> L is defined to be P,A~~~» PAP’'. The form (A, A’) = trace(AA’) 
on L is called the Killing form. For each of the following groups, verify that if P © G 
and A € L, then PAP ' € L, and prove that the Killing form is symmetric and bilinear 
and that the operation is compatible with the form, i.e., that (A, A’) = (PAP ', PA’ P '). 
(a) SOn (b) SUn (€) Os, (d) SOAC) (e) SP2,(R) 
14. Prove that the Killing form is negative definite on the Lie algebra of (a) SU, and (b) SO,. 
15. Determine the signature of the Killing form on the Lie algebra of SL,(R). 
16. (a) Use the adjoint representation of SU, to define a homomorphism ¢: SUn—— SOm, 
where m = n? — 1. 
(b) Show that when n = 2, this representation is equivalent to the orthogonal represen- 
tation defined in Section 3. 
Use the adjoint representation of SL2(C) to define an isomorphism SL2(C)/{+/} ~ 
SO3(C). 


17 


° 


7. Translation in a Group 


1. Compute the dimensions of the following groups. 
(a) SU, (b) SO,(C) (€) SPon(R) (d) Os. 
2. Using the exponential, find all solutions near / of the equation P? = /. 
3. Find a path-connected, nonabelian subgroup of GL2(R) of dimension 2. 
4. (a) Show that every positive definite hermitian matrix A is the square of another positive 
definite hermitian matrix B. 
(b) Show that B is uniquely determined by A. 
*5. Let A be a nonsingular matrix, and let B be a positive definite hermitian matrix such that 
B? = AA*. 
(a). Show that A*B is unitary. 
(b) Prove the Polar decomposition: Every nonsingular matrix A is a product A = PU, 
where P is positive definite hermitian and U is unitary. 
(c) Prove that the Polar decomposition is unique. 
(d) What does this say about the operation of left multiplication by the unitary group U,, 
on the group GL,,? 
*6. State and prove an analogue of the Polar decomposition for real matrices. 
*7. (a) Prove that the exponential map defines a bijection between the set of all hermitian 
matrices and the set of positive definite hermitian matrices. 
(b) Describe the topological structure of GL2(C) using the Polar decomposition and (a). 
8. Let B be an invertible matrix. Describe the matrices A such that P = e4 is in the central- 


izer of B. 
*9,. Let S denote the set of matrices P € SL2(R) with trace r. These matrices can be written 
in the form y where (x, y,z) lies on the quadric x(r — x) — yz = 1. 


(a) Show that the quadric is either a hyperbola of one or two sheets, or else a cone, and 
determine the values of r which correspond to each type. 
(b) In each case, determine the decomposition of the quadric into conjugacy classes. 
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10. 


(c) Extend the method of proof of Theorem (7.11) to show that the only proper normal 
subgroup of SL2(R) is {+7}. 
Draw the tangent vector field PA to the group C*, when A = 1 + i. 


8. Simple Groups 


TF 


2% 


IAM a w 


so) 


14. 


i > 


"16; 
“7. 


Which of the following subgroups of GL,(C) are complex algebraic groups? 

(a) GL,(Z) (b) SU, (c) upper triangular matrices 

(a) Wnite the polynomial functions in the matrix entries which define SO,(C). 

(b) Write out the polynomial equations which define the symplectic group. 

(c) Show that the unitary group U, can be defined by real polynomial equations in the 
real and imaginary parts of the matrix entries. 


. Determine the centers of the groups SL,(R) and SL,(C). 


Describe isomorphisms (a) PSL2(F2) ~ S3 and (b) PSL2(F3) ~ A. 
Determine the conjugacy classes of GL2(F;). 


. Prove that SL2(F) = PSL2(F) for any field F of characteristic 2. 
. (a) Determine all normal subgroups of GL2(C) which contain its center Z = {c/}. 


(b) Do the same for GL,(R). 


. For each of the seven orders (8.12), determine the order of the field F such that PSL2(F) 


has order n. 

Prove that there is a simple group of order 3420. 

(a) Let Z be the center of GL,(C). Is PSL,(C) isomorphic to GL,(C)/Z? 
(b) Answer the same question as in (a), with R replacing C. 


. Prove that PSL2(Fs) is isomorphic to As. 


Analyze the proof of Theorem (8.3) to prove that PSL2(F) is a simple group when F is a 
field of characteristic 2, except for the one case F = F;. 


. (a) Let P be a matrix in the center of SO,, and let A be a skew-symmetric matnx. Prove 


that PA = AP by differentiating the matrix function en 
(b) Prove that the center of SO, is trivial if n is odd and is {+/} if n is even and =4. 
Compute the order of the following. 
(a) SOF) and SO3(F3) 
(b) SO2(Fs) and SO3(F,) 
(a) Consider the operation of SL(C) by conjugation on the space V of complex 2 x 2 
matrices. Show that with the basis (e1;, e12, €21, €22) of V, the matrix of conjugation 


x, EO aB ObB i -1 — (eee 
by A = é ‘] has the block form |= sh where B = (A‘) E ae 


(b) Prove that this operation defines a homomorphism ¢: SL2(C)— GL,(C), and that 
the image of ¢ is isomorphic to PSL2(C). 

(c) Prove that PSL2(C) is an algebraic group by finding polynomial equations in the en- 
tries y; of a 4 X 4 matrix whose solutions are precisely the matrices in im ¢. 

Prove that PSL,(C) is a simple group. 

There is no simple group of order 2° - 7 - 11. Assuming this, determine the next smallest 

order after 2448 for a nonabelian simple group. 
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Miscellaneous Exercises 


1. Quaternions are expressions of the form a = a + bi + cj + dk, where a.b,c,d ER 
They can be added and multiplied using the rules of multiplication for the quaternior 
group [Chapter 2 (2.12)]. 

(a) Let a = a — bi — cj — dk. Compute aa. 
(b) Prove that every a # 0 has a multiplicative inverse. 


(c) Prove that the set of quaternions @ such that a> + b° + c? + d* = | forms a grouy 
under multiplication which is isomorphic to SU. 
2. The affine group A, = A,(R) is the group of coordinate changes in (x; .....- tn) which is 


generated by GL,(R) and by the group 7, of translations: t,(4) = x + a. Prove that 7), is 
a normal subgroup of A, and that A,,/7), is isomorphic to GL,(R). 
3. Cayley Transform: Let U denote the set of matrices A such that / + A Is invertible. anc 
define A’ = (4 — A)(I + A)". 
(a) Prove that if A & U, then A’ € U, and prove that A” = A. 
(b) Let V denote the vector space of real skew-symmetric n X n matrices. Prove that the 
rule Am (J — A)(/ + A) | defines a homeomorphism trom a neighborhood of (0 in 
V to a neighborhood of / in SO,. 
(c) Find an analogous statement for the unitary group. 
a 
(d) Let y = & 0 
AJ = —-JA', 
*4. Let p(t) = t° — ut + | be a quadratic polynomial, with coefficients in the field F = Fp. 
(a) Prove that if p has two distinct roots in F, then the matrices with characteristic poly- 
nomial p form two conjugacy classes in SL.{F), and determine their orders. 
(b) Prove that if p has two equal roots, then the matrices with characteristic polynomial 
p form three conjugacy classes in SL,,(F), and determine their orders. 
(c) Suppose that p has no roots in F. Determine the centralizer of the matrix 


| Show that a matrix A © U is symplectic if and only if 


Comal i. in SL;(F), and compute the order of the conjugacy class of A. 


(d) Find the class equations of SL2(F;) and SL2(Fs). 

(e) Find the class equations of PSL2(F,) and PSL2(Fs;), and reconcile your answer with 
the class equations of A, and As. 

(f) Compute the class equation for SL.(F;) and for PSL2(F;). Use the class equation for 
PSL;>(F7) to show that this group is simple. 
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Group Representations 


A tremendous effort has been made by mathematicians 
for more than a century to clear up the chaos in group theory. 
Still, we cannot answer some of the simplest questions. 


Richard Brauer 


|. DEFINITION OF A GROUP REPRESENTATION 


Operations of a group on an arbitrary set were studied in Chapter 5. In this chap- 
ter we consider the case that the group elements act as linear operators on a 
vector space. Such an operation defines a homomorphism from G to the general 
linear group. A homomorphism to the general linear group is called a matrix repre- 
sentation. 

The finite rotation groups are good examples to keep in mind. The group T of 
rotations of a tetrahedron, for example, operates on a three-dimensional space V by 
rotations. We didn’t write down the matrices which represent this action explicitly in 
Chapter 5; let us do so now. A natural choice of basis has the coordinate axes pass- 
ing through the midpoints of three of the edges, as illustrated below: 


U3 


v2 
Uv; wa 
(1.1) Figure. 
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Let y; € T denote the rotation by 7 around an edge, and let x € T denote rotation 
by 27/3 around the front vertex. The matrices representing these operations are 


| =| | 
Ry, = 


| 
| 
— 
ra 
a 
<< 
nN 
| 
— 
5 
B") 
< 
we 
| 
| 
— 
23 


Ce) 


R, =] 1 
1 


The rotations { y;, x} generate the group 7, and the matrices {Ry,, Rx} generate an iso- 
morphic group of matrices. 

It is also easy to write down matrices which represent the actions of Cn, Dn, 
and O explicitly, but / is fairly complicated. 

An n-dimensional matrix representation of a group G is a homomorphism 


(is3) R: G— GL,(F), 


where F is a field. We will use the notation R, for the image of g. So each Ry is an 
invertible matrix, and multiplication in G carries over to matrix multiplication; that 
is, Reh = RgRn. The matrices (1.2) describe a three-dimensional matrix representa- 
tion of 7. It happens to be faithful, meaning that R is an injection and therefore maps 
T isomorphically to its image, a subgroup of GL;(R). Matrix representations are not 
required to be faithful. 

When we study representations, it is essential to work as much as possible 
without fixing a basis, and to facilitate this, we introduce the concept of a represen- 
tation of a group on a finite-dimensional vector space V. We denote by 


(1.4) GL(V) 


the group of invertible linear operators on V, the multiplication law being, as always, 
composition of functions. The choice of a basis of V defines an isomorphism of this 
group with the group of invertible matrices: 


(1.5) GL(V)——__ GL, (F) 
T se matrix of T. 
By a representation of G on V, we mean a homomorphism 
(1.6) p: G—>GLI(V). 


The dimension of the representation p is defined to be the dimension of the vector 
space V. We will study only representations on finite-dimensional vector spaces. 
Matrix representations can be thought of as representations of G on the space 
F" of column vectors. 
Let p be a representation. We will denote the image of an element g in GL(V) 
by pg. Thus pg is a linear operator on V, and pen = pgpn. If a basis B = (14,..., vn) 
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is given, the representation p defines a matrix representation R by the rule 
(1.7) Re = matrix of pg. 

We may write this matrix symbolically, as in Chapter 4 (3.1), as 

(1.8) p(B) = BRg. 

If X is the coordinate vector of a vector v € V, that is, if v = BX, then 
(1.9) ReX is the coordinate vector of p,(v). 


The rotation groups are examples of representations on a real vector space V 
without regard to a choice of basis. The rotations are linear operators in GL(V). In 
(1.1) we chose a basis for V, thereby realizing the elements of T as the- matrices 
(1.2) and obtaining a matrix representation. 

So all representations of G on finite-dimensional vector spaces can be reduced 
to matrix representations if we are willing to choose a basis. We may need to choose 
one in order to make explicit calculations, but then we must study what happens 
when we change our basis, which properties are independent of the choice of basis, 
and which choices are the good ones. 

A change of basis in V given by a matrix P changes a matrix representation R 
to a conjugate representation R' = PRP™', that is, 


(1.10) Ry,’ = PRgP'' for every g. 


This follows from rule (3.4) in Chapter 4 for change of basis. 

There is an equivalent concept, namely that of operation of a group G on a 
vector space V. When we speak of an operation on a vector space, we always mean 
one which is compatible with the vector space structure—otherwise we shouldn’t be 
thinking of V as a vector space. So such an operation is a group operation in the 
usual sense [Chapter 5 (5.1)]: 


(1.11) lv =v and (gh)vo = g(hv), 


for all g,h © G and all v € V. In addition, every group element is required to act 
on V as a linear operator. Writing out what this means, we obtain the rules 


(1.12) g(v + v') = go + gv’ and g(cv) = cgv 


which, when added to (1.11), give a complete list of axioms for an operation of G on 
the vector space V. Since G does operate on the underlying set of V, we can speak of 
orbits and stabilizers as before. 

The two concepts “operation of G on V” and “representation of G on V” are 
equivalent for the same reason that an operation of a group G on a set S is equivalent 
to a permutation representation (Chapter 5, Section 8): Given a representation p of 
G on V, we define an operation by the rule 


(1.13) gu = p,(v), 
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and conversely, given an operation, the same formula can be used to define the oper- 
ator p, for all g € G. It is a linear operator because of (1.12), and the associative 
law (1.11) shows that pepir = Pen. 

Thus we have two notations (1.13) for the action of g on v, and we will use 
them interchangeably. The notation gv is more compact, so we use it when possible. 

In order to focus our attention, and because they are the easiest to handle, we 
will concentrate on complex representations for the rest of this chapter. Therefore 
the vector spaces V which occur are to be interpreted as complex vector spaces, and 
GL, will denote the complex general linear group GL,(C). Every real matrix repre- 
sentation, such as the three-dimensional representation (1.2) of the rotation group 
T, can be used to define a complex representation, simply by interpreting the real 
matrices as complex matrices. We will do this without further comment. 


2. G-INVARIANT FORMS AND UNITARY REPRESENTATIONS 


A matrix representation R: G——> GL, is called unitary if all the matrices Rz are uni- 
tary, that is, if the image of the homomorphism R is contained in the unitary group. 
In other words, a unitary representation is a homomorphism 


(Qual) R: G—> U,, 


from G to the unitary group. 
In this section we prove the following remarkable fact about representations of 
finite groups. 


(2:2) Theorem. 


(a) Every finite subgroup of GL, is conjugate to a subgroup of U,. 

(b) Every matrix representation R: G—~» GL, of a finite group G is conjugate to a 
unitary representation. In other words, given R, there is a matrix P © GL, 
such that PR,P'' € U, for every g € G. 


(2.3) Corollary. 


(a) Let A be an invertible matrix of finite order in GL,, that is, such that A” = / 
for some r. Then A is diagonalizable: There is a P © GL, so that PAP ' is diag- 
onal. 


(b) Let R: G——>GL, be a representation of a finite group G. Then for every 
g © G, Rg is a diagonalizable matrix. 


Proof of the corollary. (a) The matrix A generates a finite subgroup of GLp. 
By Theorem (2.2), this subgroup is conjugate to a subgroup of the unitary group. 
Hence A is conjugate to a unitary matrix. The Spectral Theorem for normal operators 
[Chapter 7 (7.3)] tells us that every unitary matrix is diagonalizable. Hence A is 
diagonalizable. 


Section 2 G-Invariant Forms and Unitary Representations 311 


(b) The second part of the corollary follows from the first, because every ele- 
ment g of a fimite group tas finite order. Since & is a homomorphism, Ry has finite 
order too. :: 


The two parts of Theorem (2.2) are more or less the same. We can derive (a) 
from (b) by considering the inciusion map of a finite subgroup into GL, as a matrix 
representation of the group. Conversely, (b) follows by applying (a) to the image 
of R. 

In order to prove part (b), we restate it in basis-free terminology. Consider. a 
hermitian vector space V (a complex vector space together with a positive definite 
hermitian form (,)). A linear operator T on V is unitary if (tc, w) = (T(v), T (w)) for 
all v,w © V [Chapter 7 (5.2)]. Theretore it is natural to call a representation 
p: G—>GL(V) unitary if p, is a unitary operator for all g € G, that is, if 


(2.4) (vw) = (pele). pelw)), 


for all v.w © V and all g © G. The matrix representation R (J.7) associated to a 
unitary representation p will be unitary in the sense of (2.1), provided that the basis 
is orthonormal. This follows from Chapter 7 (5.2b). 

To simplify notation, we will write condition (2.4) as 


(2.5) {t,w) = (gv, gw). 


We now turn this formula around and view it as a condition on the form instead of 
on the operation. Given a representation p of G on a vector space V, a form (,) on V 
is called G-invariant if (2.4), or equivalently, (2.5) holds. 


(2.6) Theorem. Let p be a representation of a finite group G on a complex vector 
space V. There exists a G-invariant, positive definite hermitian form (,) on V. 


Proof. We start with an arbitrary positive definite hermitian form on V; say 
we denote it by {,}. We will use this form to define a G-invariant form, by averag- 
ing over the group. Averaging over G is a general method which will be used again. 
It was already used in Chapter 5 (3.2) to find a fixed point of a finite group operation 
on the plane. The form {, ) we want is defined by the rule 


(2.7) (v,) = 4 D (gv, gv 


where V = |G| is the order of G. The normalization factor 1/N is customary but 
unimportant. Theorem (2.6) follows from this lemma: 
(2.8) Lemma. The form (2.7) is hermitian, positive definite, and G-invariant. 


Proof. The verification of the first two properties is completely routine. For ex- 
ample, 


{gv, a(w + w’)} = {gv, gw + gw’} = {gv, ew} + {gv, gw’}. 
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Therefore 


a) j lw | 7 ; 
(uw + w’) i > {gv.giw + w I} = oD (ge. gu} + 7D (gv. gw ’} 
, ? a EG 


gel Ke 


i 


{I 


(oon eo), 


To show that the form (,) is G-invariant, let go be an element of G. We must 
show that (gov. gow) = (v,w) for all ce, w © V. By definition, 


lu 
(got, Row) = N > {gaot, ggow}. 
: ge G 


There is an important trick for analyzing such a summation, based on the fact that 
right multiplication by gu is a bijective map G——> G. As g runs over the group. the 
products gg, do too, in a different order. We change notation, substituting g’ for 
v2). Then in the sum, g’ runs over the group. So we may as well write the sum as 
being over xy’ © G rather than over g € G. This merely changes the order in which 
the sum is taken. Then 


Ie dex , , 
(got, gow) =~ > fegov, ggow} =~ D> {g'v.g’w} = (v.w), 
N eEG N EG 


as required. Please think this reindexing trick through and understand tt. > 


Theorem (2.2) follows easily from Theorem (2.6). Any homomorphism 
R: G—>GL,, 1s the matrix representation associated to a representation (with 
V = C’ and B = EB). By Theorem (2.6), there is a G-invariant form (,) on V, and 
we choose an orthonormal basis for V with respect to this form. The matrix repre- 
sentation R’ obtained via this basis is conjugate to R (1.10) and unitary [Chapter 7 


(5.2)]. 


(2.9) Example. The matrix A = id has order 3, and therefore it defines a 


matrix representation {/, A, A’} of the cyclic group G of order 3. The averaging pro- 
cess (2.7) will produce a G-invariant form from the standard hermitian product X*Y 


on C?. It is 
(2.10) (X,Y) = s[x*7 + (AX)*(AY) + (A°x)*(A2Y)] = X*BxX, 
where 
evi Te nceebsD |e. 
(2.11) B= litatat(A )*(A’)] = 5 E A 


3. COMPACT GROUPS 


A linear group is called compact if it is a closed and bounded subset of the space of 
matrices [Appendix (3.8)]. The most important compact groups are the orthogonal 
and unitary groups: 
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(3.1) Proposition. The orthogonal and unitary groups are compact. 


Proof. The columns of an orthogonal matrix P form an orthonormal basis, so 
they have length |. Hence all of the matrix entries have absolute value <1. This 
shows that O, is contained in the box defined by the inequalities | p;| < 1. So it is a 
bounded set. Because it is defined as the common zeros of a set of continuous func- 
tions, it is closed too, hence compact. The proof for the unitary group is the same. a 


The main theorems (2.2, 2.6) of Section 2 carry over to compact linear groups 
without major change. We will work out the case of the circle group G = SO, as an 
exampie. The rotation of the plane through the angle 6 was denoted by pe in Chapter 
5. Here we will consider an arbitrary representation of G. To avoid confusion, we 
denote the element 


(3.2) ie 6 —-sin 6 


sin @ cos | em 


by its angle 6, rather than by pe. Formula (3.2) defines a particular matrix represen- 
tation of our group, but there are others. 

Suppose we are given a continuous representation o of G on a finite-dimen- 
sional space V, not necessarily the representation (3.2). Since the group law is addi- 
tion of angles, the rule for working with o is 09+ = O@0r. To say that the opera- 
tion is continuous means that if we choose a basis for V, thereby representing the 
operation of 6 on V by some matrix So, then the entries of S are continuous func- 
tions of 6. 

Let us try to copy the proof of (2.6). To average over the infinite group G, we 
replace summation by an integral. We choose any positive definite hermitian form 
{,}on V and define a new form by the rule 


(3.3) (u,w) = - i {oev, cow} dO. 


This form has the required properties. To check G-invariance, fix any element 
9) € G, and let 7 = 0 + 0. Then dy = dé. Hence 


1 21r 
(3.4) (F6,U,70,W) = L| {7900,U, 799,w} dO 
0 


27 
= i | {onv, Ow} dn 7 {v,w), 
27m). 


as required. 

We will not carry the proof through for general groups because some serious 
work has to be done to find a suitable volume element analogous to d@ in a given 
compact group G. In the computation (3.4), it is crucial that d@ = d(6 + 6), and 
we were lucky that the obvious integral was the one to use. 

For any compact group G there is a volume element dg called Haar measure, 
which has the property of being translation invariant: If go © G is fixed and 
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g’ = ego, then 
(3:5) dg = dg’. 


Using this measure, the proof carries over. We will not prove the existence of a 
Haar measure, but assuming one exists, the same reasoning as in (2.8) proves the 
following analogue of (2.6) and (2.2): 


(3.6) Corollary. Let G be a compact subgroup of GL,. Then 


(a) Let o be a representation of G on a finite-dimensional vector space V. There is 
a G-invariant, positive definite hermitian form (,) on V. 

(b) Every continuous matrix representation R of G is conjugate to a unitary repre- 
sentation. 

(c) Every compact subgroup G of GL, is conjugate to a subgroup of Un. o 


4. G-INVARIANT SUBSPACES AND IRREDUCIBLE REPRESENTATIONS 


Given a representation of a finite group G on a vector space V, Corollary (2.3) tells 
us that for each group element g there is a basis of V so that the matrix of the opera- 
tor Pz is diagonal. Obviously, it would be very convenient to have a single basis 
which would diagonalize p, for all group elements g at the same time. But such a ba- 
sis doesn't exist very often, because any two diagonal matrices commute with each 
other. In order to diagonalize the matrices of all p, at the same time, these operators 
must commute. It follows that any group G which has a faithful representation by di- 
agonal matrices is abelian. We will see later (Section 8) that the converse is also 
true. If G is a finite abelian group. then every matrix representation R of G is diago- 
nalizable; that is, there is a single matrix P so that PR,P | is diagonal for all g € G. 
In this section we discuss what can be done for finite groups in general. 

Let p be a representation of a group G on a vector space V. A subspace of V is 
called G-invariant if 


(4.1) ew EW, forallw © Wand g EG. 


So the operation by every group element g must carry W to itself. that is, gW C W. 
This is an extension of the concept of T-invariant subspace introduced in Section 3 
ot Chapter 4. In a representation, the elements of G represent linear operators on V, 
and we ask that W be an invariant subspace for each of these operators. If W is 
G-invariant, the operation of G on V will restrict to an operation on W. 

As an example, consider the three-dimensional representation of the dihedral 
group defined by the symmetries of an n-gonA [Chapter 5 (9.1)]. So G = Dy. 
There are two proper G-invariant subspaces: The plane containing A and the line 
perpendicular to A. On the other hand, there is no proper 7-invariant subspace for 
the representation (1.2) of the tetrahedral group T, because there is no line or plane 
which ts carried to itself by every element of T. 
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If a representation p of a group G on a nonzero vector space V has no proper 
G-invariant subspace, it is called an irreducible representation. If there is a proper 
invariant subspace, p is said to be reducible. The standard three-dimensional repre- 
sentation of T is irreducible. 

When V is the direct sum of G-invariant subspaces: V = W, ® Wo, the repre- 
sentation p on V is said to be the direct sum of its restrictions p; to W;, and we write 


(4.2) p = pi pr. 


Suppose this is the case. Chvose bases B,, B2 of W,, W2, and let B = (B,, B2) be the 
basis of V obtained by listing these two bases in order [Chapter 3 (6.6)]. Then the 
matrix Re of pg will have the block form 


(4.3) Re = E ° | 


where Ag is the matrix of pi, with respect to B, and By is the matrix of p2, with re- 
spect to B2. Conversely, if the matrices Ry have such a block form, then the repre- 
sentation is a direct sum. 

For example, consider the rotation group G = D, operating on R? by sym- 
metries of an n-gon A. If we choose an orthonormal basis B so that v, is perpendicu- 
lar to the plane of A and v2 passes through a vertex, then the rotations corresponding 
to our standard generators x, y [Chapter 5 (3.6)] are represented by the matrices 


1 = 
Rx = Cn —Sn js Ry = ] 9 


(4.4) pe ai 


where cn = cos(27/n) and s, = sin(27/n). So R is a direct sum of a one-dimen- 
sional representation A, 


(4.5) Avgaldd, Ay= [~1), 


and a two-dimensional representation B, 


Cn —Sn sot 1 
(4.6) li, k a By = i 


The representation B is the basic two-dimensional representation of D, as sym- 
metries of A in the plane. 

On the other hand, even if a representation p is reducible, the matrices Ry will 
not have a block form unless the given basis for V is compatible with the direct sum 
decomposition. Until we have made a further analysis, it will be difficult to tell that 
a representation is reducible, when it is presented using the wrong basis. 


(4.7) Proposition. Let p be a unitary representation of G on a hermitian vector 
space V, and let W be a G-invariant subspace. The orthogonal complement W* is 
also G-invariant, and p is a direct sum of its restrictions to W and W~. 
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Proof. Let v © W-, so that v 1 W. Since the operators pg are unitary, they 
preserve orthogonality [Chapter 7 (5.2)], so gv 1 gW. Since W is G-invariant, 
W = gW, so gu L W. Therefore gv € W*. This shows that W* is G-invariant. We 
know that V = W@W‘ by Chapter 7 (2.7). o 


This proposition allows us to decompose a representation as a direct sum, pro- 
vided that there is a proper invariant subspace. Together with induction, this gives us 
the following corollary: 


(4.8) Corollary. Every unitary representation p: G——>GL(V) on a hermitian 
vector space V is a direct sum of irreducible representations. o 


Combining this corollary with (2.2), we obtain the following: 


(4.9) Corollary. Maschke’s Theorem: Every representation of a finite group G is a 
direct sum of irreducible representations. 5 


5. CHARACTERS 


Two representations p: G——>GL(V) and p’: G——>GL(V’) of a group G are 
called isomorphic, or equivalent, if there is an isomorphism of vector spaces 
T: V—— V' which is compatible with the operation of G: 


(S.) gl (v) =T(gv) or p,'T(v) = T(p,(v)), 


for all ve © Vand g € G. If Bis a basis for V and if B’ = 7(B) is the corresponding 
basis of V’, then the associated matrix representations R, and R,' will be equal. 

For the next four sections, we restrict our attention to representations of finite 
groups. We will see that there are relatively few isomorphism classes of irreducible 
representations of a finite group. However, each representation has a complicated 
description in terms of matrices. The secret to understanding representations is not 
to write down the matrices explicitly unless absolutely necessary. So to facilitate 
classification we will throw out most of the information contained in a representation 
p, keeping only an essential part. What we will work with is the trace, called the 
character, of p. Characters are usually denoted by y. 

The character x of a representation p is the map y: G—— C defined by 


(5.2) x(g) = trace(p,). 
If R is the matrix representation obtained from p by a choice of basis for V, then 
Cle) X(g) = trace(Ry) = Ayt---+An, 


where A; are the eigenvalues of Re, or of pg. 
The dimension of a character x is defined to be the dimension of the repre- 


sentation p. The character of an irreducible representation is called an irreducible 
character. 
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Here are some basic properties of the character: 


(5.4) Proposition. Let y be the character of a representation p of a finite group G 
on a vector space V. 


(a) x(1) is the dimension of the character [the dimension of V]. 


(b) x(g) = x (hgh ') for all g,h © G. In other words, the character is constant on 
each conjugacy class. 


(c) y(g'') = v(g) [the complex conjugate of x (g)]. 
(d) If y’ is the character of another representation p', then the character of the 
direct sum p®@p' is y + x’. 


Proof. The symbol | in assertion (a) denotes the identity element of G. This 
property is trivial: y(1) = trace / = dim V. Property (b) is true because the 
matrix representation R associated to p is a homomorphism, which shows that 
Rhgh | = RnRgRn ', and because trace (RaRgR,') = trace Re [Chapter 4 (4.18)]. 
Property (d) is also clear, because the trace of the block matrix (4.3) is the sum of 
the traces of Ag and By. 

Property (c) is less obvious. If the eigenvalues of Ry are A,,...,An, then the 
eigenvalues of Ry-! = (Ry) ' are A, ',...,An'. The assertion of (c) is 


x(g°') = Ay iter +A, ! = ite +Aq = x (g), 


and to show this we use the fact that G is a finite group. Every element g of G has 
finite order. If g’ = |, then R, is a matrix of order r, so its eigenvalues A,,..., An are 
roots of unity. This implies that |A;| = 1, hence that A;"' = A; for each i. o 


In order to avoid confusing cyclic groups with conjugacy classes, we will de- 
note conjugacy classes by the roman letter C, rather than an italic C, in this chapter. 
Thus the conjugacy class of an element g € G will be denoted by Cg. 

We shall note two things which simplify the computation of a character. First 
of all, since the value of y depends only on the conjugacy class of an element g € G 
(5.4b), we need only determine the values of y on one representative element in 
each class. Second, since the value of the character y(g) is the trace of the operator 
pz, and since the trace doesn’t depend on the choice of a basis, we are free to choose 
a convenient one. Moreover, we may select a convenient basis for each individual 
group element. There is no need to use the same basis for all elements. 

As an example, let us determine the character y of the rotation representation 
of the tetrahedral group T defined by (1.2). There are four conjugacy classes in 7, 
and they are represented by the elements |, x,.x°., where as before x is a rotation 
by 27/3 about a vertex and y is a rotation by a about the center of an edge. The 
values of the character on these representatives can be read off from the matrices 
(te): . 


(5.5) iil) = 2 vi 9. agi = 0, x (yer. 


It is sometimes useful to think of a character y as a vector. We can do this by 
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listing the elements of G in some order: G = {g1,..-, Qw}; then the vector represent- 
ing y will be 
(5.6) X = (X(8i) 000 X (8nd) 


Since x is constant on conjugacy classes, it is natural to list G by listing the con- 
jugacy classes and then running through each conjugacy class in some order. If we 
do this for the character (5.5), listing C,, Cx, Cx2, Cy in that order, the vector we ob- 
tain is 

(5:7) vi= (3-070, 0 030/070; 0;—1e-L, 1). 


We will not write out such a vector explicitly again. 

The main theorem on characters relates them to the hermitian dot product on 
C”. This is one of the most beautiful theorems of algebra, both because its statement 
is intrinsically so elegant and because it simplifies the problem of classifying repre- 
sentations so much. We define 


(5.8) x) =7 > x@x'(g), 


where v = |G|. If vy, x’ are represented by vectors as in (5.7), this is the standard 
hermitian product, renormalized by the factor 1/N. 


(5.9) Theorem. Let G be a group of order 4, let p;,p2,... represent the distinct 
isomorphism classes of irreducible representations of G, and let y; be the character 
of ap 
(a) Orthogonality Relations: The characters y; are orthonormal. In other words, 
(Xi» Xi) = Oif i # j, and (yi, y;) = 1 for each i. 
(b) There are finitely many isomorphism classes of irreducible representations, the 
same number as the number of conjugacy classes in the group. 


(c) Let d; be the dimension of the irreducible representation p;, and let r be the 
number of irreducible representations. Then d; divides N, and 


(5.10) N Std? ie eee 


This theorem will be proved in Section 9, with the exception of the assertion 
that d; divides N, which we will not prove. 


A complex-valued function ¢: G——>C which is constant on each conjugacy 
class 1s called a class function. Since a class function is constant on each class, it 
may also be described as a function on the set of conjugacy classes. The class func- 
tions form a complex vector space, which we denote by . We use the form defined 
by (5.8) to make € into a hermitian space. 


(5.11) Corollary. The irreducible characters form on orthonormal basis of &. 
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This follows from (5.9a,b). The characters are linearly independent because 
they are orthogonal, and they span because the dimension of ‘€ is the number of 
conjugacy classes, which is r. = 


The corollary allows us to decompose a given character as a linear combina- 
tion of the irreducible characters, using the formula for orthogonal projection 
[Chapter 7 (3.8)]. For let y be the character of a representation p. By Corollary 
(4.9), p is isomorphic to a direct sum of the irreducible representations p,,..., p,; 
Say we write this symbolically as p = n,p,®---@n,p,, where n; are nonnegative 
integers and where np stands for the direct sum of n copies of the representation p. 
Then yx = nyit---+n,-yxr. Since (y;,...,,) is an orthonormal basis, we have the 
following: 


(5.12) Corollary. Let y,,....y, be the irreducible characters of a finite group G, 
and let y be any character. Then y = myit-::+n-yx,-, where ni = (x, Xi). 0 


(5.13) Corollary. If two representations p,p’ have the same character, they are 
isomorphic. 


For let y,x' be the characters of two representations p,p', where 
p=npi\®8::-@®n-p, and p' = n'p,®-::@n,'p,. Then the characters of these 
representations are y = myit+---+n,-x, and y' = ni yit-:-+n,' xr. Since y1,..., Xr 
are linearly independent, y = y’ implies that n; = n;’ for each i. o 


(5.14) Corollary. A character y has the property (y, y) = 1 if and only if it is ir- 
reducible. 


For if xy = mxit+::t+nrxr, then (y,) = n’?+-+:t+n/. This gives the value 1 
if and only if a single n; is 1 and the rest are zero. o 


The evaluation of (x, v) is a very practical way to check irreducibility of a rep- 
resentation. For example, let y be the character (5.7) of the representation (1.2). 
Then (y, x) = (3?+14+14+1)/12 = 1. So y is irreducible. 

Part (c) of Theorem (5.9) should be contrasted with the Class Equation 
[Chapter 6 (1.7)]. Let C.,...,C, be the conjugacy classes in G, and let c; = |C;| be 
the order of the conjugacy class. Then c; divides N, and N = c, + -*- + cy. Though 
there is the same number of conjugacy classes as irreducible representations, their 
exact relationship is very subtle. 

As our first example, we will determine the irreducible representations of the 
dihedral group D3 [Chapter 5 (3.6)]. There are three conjugacy classes, C, = {1}, 
C. = {y, xy, x?y}, Cs = {x, x7} [Chapter 6 (1.8)], and therefore three irreducible 
representations. The only solution of equation (5.10) is 6 = 17 + 1° + 2?, so D; has 
two one-dimensional representations p,,p2 and one irreducible two-dimensional rep- 
resentation p3. Every group G has the trivial one-dimensional representation 
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(Ry = | for all g): let us call it p;. The other one-dimensional representation is the 
sign representation of the symmetric group $3, which is isomorphic to D3: 
R. = sign (zg) = +1. This is the representation (4.5); let us call it p2. The two- 
dimensional representation is defined by (4.6); call it pa. 

Rather than listing the characters y; as vectors, we usually assemble them into 
a character table. In this table, the three conjugacy classes are represented by the 
elements I, y,x. The orders of the conjugacy classes are given above them. Thus 
IC, | = 3. 


conjugacy 
class 

, (1) (3) (2) order of the class 

[ ¥  ¥ representative element 
rede thic 3 | | l : 
charactet am | i | | value of the 

vo character 
I 4 


\ 0 | 


(5.15) CHARACTER TABLE FOR Dy 


In such a table, the top row, corresponding to the trivial character, consists en- 
tirely of 1’s. The first column contains the dimensions of the representations, be- 
cause yi(1) = dim pj. 

To evaluate the bilinear form (5.8) on the characters, remember that there are 
three elements in the class of y and two in the class of x. Thus 


II 


(Xx) 5 Rg) = (Oat) + 3- GAO DaG) + 2- Gal) /6 


(1-2-2 + 3-0-0 + 2-(-1)-(-1))/6 = 1. 


This confirms the fact that ps; is irreducible. 


I| 


As another example, consider the cyclic group C3 = {l,x, x7} of order 3. 
Since C, is abelian, there are three conjugacy classes, each consisting of one ele- 
ment. Theorem (5.9) shows that there are three irreducible representations, and that 
each has dimension |. Let £ = $(-1 + /3i) be a cube root of 1. The three repre- 
sentations are 


(5.16) Poa 1, [Pao = ms ee be 


(5.17) CHARACTER TABLE FOR C; 
Note that £ = £7. So 
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(ew = (Ub + 24+ 2O/3 =U + 04+ 2/3 = 0. 


which agrees with the orthogonality relations. 

As a third example, let us determine the character table of the tetrahedral 
group 7. The conjugacy classes C;,C,.C.2.C, were determined above, and 
the Class Equation is 12 = 1+4+4+3. The only solution of (5.10) is 
12 = 1°+1°+1°+3°, so there are four irreducible representations, of dimensions 
1, 1, 1, 3. Now it happens that 7 has a normal subgroup H of order 4 which is iso- 
morphic to the Klein four group, and such that the quotient 7 = 7/H is cyclic of or- 


der 3. Any representation p of T will give a representation of T by composition: 
fey sem 
Thus the three one-dimensional representations of the cyclic group determine repre- 


sentations of 7. Their characters y;, ¥2, x3 can be determined from (5.17). The char- 
acter (5.5) is denoted by y, in the table below. 


(1) (4) 4 (3) 
1 Ce ey) 


(5.18) CHARACTER TABLE FOR T 


Various properties of the group can be read off easily from the character table. 
Let us forget that this is the character table for 7, and suppose that it has been given 
to us as the character table of an unknown group G. After all, it is conceivable that 
another isomorphism class of groups has the same characters. 

The order of G is 12, the sum of the orders of the conjugacy classes. Next, 
since the dimension of p2 is 1, y2(y) is the trace of the 1 X | matrix p2,. So the fact 
that x2(y) = 1 shows that p2, = | too, that is, that y is in the kernel of p2. In fact, 
the kernel of p> is identified as the union of the two conjugacy classes C, U Cy. This 
is a subgroup H of order 4 in G. Moreover, H is the Klein four group. For if H were 
C,, its unique element of order 2 would have to be in a conjugacy class by itself. It 
also follows from the value of y2(x) that the order of x is divisible by 3. Going back 
to our list [Chapter 6 (5.1)] of groups of order 12, we see that G ~ A,T. 


6. PERMUTATION REPRESENTATIONS AND 
THE REGULAR REPRESENTATION 


Let S be a set. We can construct a representation of a grouy G from an operation of 
G on S, by passing to the vector space V = V(S) of formal linear combinations 
[Chapter 3 (3.21)] 

v= Dy ais;, ai EC. 
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An element g € G operates on vectors by permuting the elements of S, leaving the 
coefficients alone: 


(6.1) gv = AiQ5;. 


If we choose an ordering s1,..., 5» of S and take the basis (s,,..., Sn) for V, then 
R, is the permutation matrix which describes the operation of g on S. 

For example, let G = T and let S be the set of faces of the tetrahedron: 
S = (fi,.... fa). The operation of G on S defines a four-dimensional representation 
of G. Let x denote the rotation by 27/3 about a face f, and y the rotation by 7 about 
an edge as before. Then if the faces are numbered appropriately, we will have 


1 0 om@ on 1 0.0 
oh Fad lag wm ere a oil i 
- * TO 1220 yO 1 

0.0 6 O70) -0 


We will call p (or R) the representation associated to the operation of G on S$ 
and will often refer to p as a permutation representation, though that expression has 
a meaning in another context as well (Chapter 5, Section 8). 

If we decompose a set on which G operates into orbits, we will obtain a de- 
composition of the associated representation as a direct sum. This is clear. But there 
is an important new feature: The fact that linear combinations are available in 
V(S) allows us to decompose the representation further. Even though § may consist 
of a single orbit, the associated permutation representation p wili never be irre- 
ducible, unless S has only one element. This is because the vector w = s; + --: + 5, 
is fixed by every permutation of the basis, and so the one-dimensional sybspace 
W-= {cw} is G-invariant. The trivial representation is a summand of every permuta- 
tion representation. 

It is easy to compute the character of a permutation representation: 


(6.3) x(g) = number of elements of S fixed by g, 


because for every index fixed by a permutation, there is a 1 on the diagonal of the 
associated permutation matrix, and the other diagonal entries are 0. For example, 
the character y of the representation of F on the faces of a tetrahedron is 


23 ) 
(6.4) ee 
ye treo 


and the character table (5.18) shows that y = x + xs. Therefore p ~ pi ®pa by 
Corollary (5.13). As another example, the character of the operation of T on the six 
edges of the tetrahedron is 


exc Oo ny 
6.5 
8:3) Vito Oo OW a2 


and using (5.18) again, we find that y = yi + x2 + x3 + Xa. 
The regular representation p™® of G is the representation associated to the op- 
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eration of G on itself by left multiplication. In other words, we let S = G, with the 
operation of left multiplication. This is not an especially interesting operation, but its 
associated representation is very interesting. Its character y"® is particularly simple: 


(6.6) x™(1) =n", and yg) =0, ifg #1, 


where N = G. The first formula is clear: y (1) = dim p for any representation p, and 
p“® has dimension nv. The second follows from (6.3), because multiplication by g 
does not fix any element of G, unless g = 1. 

Because of this formula, it is easy to compute (y®, y) for the character y of 
any representation p by the orthogonal projection formula (5.12). The answer is 
(6.7) (v8, x) = dim p, 


because y(1) = dim p. This allows us to write y“® as a linear combination of the 
irreducible characters: 


(6.8) Corollary. y™® = diy, + -:: + d-x,, and p™® ~ d\p,®---@d,-p,, where 
d; is the dimension of p; and d,p; stands for the direct sum of d; copies of pi. o 


Isn’t this a nice formula? We can deduce formula (5.10) from (6.8) by count- 
ing dimensions. This shows that formula (5.10) of Theorem (5.9) follows from the 
orthogonality relations. 

For instance, for the group D3, the character of the regular representation is 


| eee eae 
yoo 0 0” 


and Table (5.15) shows that y™® = y: + yo + 2y3, as expected. 
As another example, consider the regular representation R of the cyclic group 
{1, x, x7} of order 3. The permutation matrix representing x is 


] 
Rx =] 1 
I 
Its eigenvalues are 1,¢,¢°, where { = 3(-1 + V3i). Thus R, is conjugate to 
l 


Ry = ra 
G 
This matrix displays the decomposition p™® ~ p,®p2@ pz of the regular represen- 
tation into irreducible one-dimensional representations. 


7, THE REPRESENTATIONS OF THE ICOSAHEDRAL GROUP 


In this section we determine the irreducible characters of the icosahedral group. So 
far, we have seen only its trivial representation p, and the representation of dimen- 
sion 3 as a rotation group. Let us denote the rotation representation by p2. There are 
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five conjugacy classes in J [Chapter 6 (2.2)], namely 
(7.1) Cc, = {1}, 
C, = 15 rotations “x” through the angle 77, 
C; = 20 rotations “y” by 27/3, 47/3, 
C, = 12 rotations “z” by 27/5, 87/5, 
Cs = 12 rotations “z?” by 47/5, 67/5, 


and therefore there are three more irreducible representations. Given what we know 
already, the only solution to (5.10) is d; = 1,3, 3,4, 5: 


i 


60 = 1°+37+37+47+5?. 


We denote the remaining representations by p3, 4,5, where dimp3 = 3, and 
so on. A good way to find the missing irreducible representations is to decompose 
some known permutation representations. We know that J operates on a set of five 
elements [Chapter 6 (2.6)]. This gives us a five-dimensional representation p’. As 
we saw in Section 6, the trivial representation is a summand of p’. Its orthogonal 
complement turns out to be the required irreducible four-dimensional representation: 
p' = pi ®paz. Also, / permutes the set of six axes through the centers of opposite 
faces of the dodecahedron. Let the corresponding six-dimensional representation be 
p”. Then p” = p:®ps. We can check this by computing the characters of p, and ps 
and applying Theorem (5.9). The characters ys, ¥s are computed from y', x” by 
subtracting vy, = | from each value (5.4d). For example, p’ realizes x as an even 
permutation of {1,...,5} of order 2, so it is a product of two disjoint transpositions, 
which fixes one index. Therefore y’(x) = 1, and x4(x) = 0. 

The second three-dimensional representation p; is fairly subtle because it is so 
similar to p2. It can be obtained this way: Since / is isomorphic to As, we may view 
it as a normal subgroup of the symmetric group S;. Conjugation by an element p of 
Ss which is not in As defines an automorphism o of As. This automorphism inter- 
changes the two conjugacy classes C,, Cs. The other conjugacy classes are not inter- 
changed, because their elements have different orders. For example, in cycle nota- 
tion, let z = (12345) and let p = (2354). Then p 'zp = (4532)(12345)(2354) = 
(13524) = z?. The representation p3 is p2°o. 

The character of p3 is computed from that of p2 by interchanging the values for 
z,z*. Once these characters are computed, verification of the relations (y;. yj) = 0, 
(xi, Xi) = 1 shows that the representations are irreducible and that our list is correct. 


CO US) 20) 2) i) 
l x y z Ze 


(7.2) CHARACTER TABLE FOR / = As 
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In this table, @ is the trace of a three-dimensional rotation through the angle 277 /5, 
which ts 

a = 1 + 2 cos 27/5 = $(-1 + V5), 
and B is computed similarly: B = | + 2 cos 47/5 = 4(-1 — V5). 


8. ONE-DIMENSIONAL REPRESENTATIONS 


Let p be a one-dimensional representation of a group G. So R, is a 1 X 1 matrix, and 
x(g) = Ry. provided that we identify a | X | matrix with its single entry. Therefore 
in this case the character x is a homomorphism y: G——> C, that is, it satisfies the 
tule 


(8.1) x(gh) = v(g)v(h), if dim p = 1. 


Such a character is called abelian. Please note that formula (8.1) is not true for char- 
acters of dimension >1. 

If G is a finite group, the values taken on by an abelian character y are always 
roots of I: 


(8.2) ED ea | 


for some r, because the element g has finite order. 
The one-dimensional characters form a group under multiplication of func- 
tions: 


(8.3) xx '(2) = x(g)x'(g). 


This group is called the character group of G and is often denoted by G. The char- 
acter group is especially important when G is abelian, because of the following fact: 


(8.4) Theorem. If G is a finite abelian group, then every irreducible representa- 
tion of G is one-dimensional. 


Proof. Since G is abelian, every conjugacy class consists of one element. So 
the number of conjugacy classes is N. By Theorem (5.9), there are N irreducible rep- 
resentations, and d, = d)=-::=d,=1.5 


9. SCHUR’S LEMMA, AND PROOF OF THE 
ORTHOGOMALITY RELATIONS 


Let p,p’ be representations of a group G on two vector spaces V, V’. We will call a 
linear transformation T; V——> V’ G-invariant if it is compatible with the two oper- 


ations of G on V and V’, that is, if 
(9.1) gl (v) = T(gv), or pg’ (T(v)) = T(p,(v)), 


for all g © G and v € V. Thus an isomorphism of representations (Section 5) is a 
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bijective G -invariant transformation. We could also write (9.1) as 
(9.2) pe °T =T°p,, forallg EG. 


Let bases B, B’ for V and V’ be given, and let Ry, Rg’ and A denote the matrices 
of Pg, pe’ and T with respect to these bases. Then (9.2) reads 


(9.3) R,'A = ARg, forallg €G. 


The special case that p = p’ is very important. A G-invariant linear operator 
T on V is one which commutes with pg, for every g € G: 
(9.4) Pe? T=T° peg or RA = ARg. 


These formulas just repeat (9.2) and (9.3) when p = p’. 


(9.5) Proposition. The kernel and image of a G-invariant linear transformation 
T: V — V' are G-invariant subspaces of V and V’ respectively. 


Proof. The kernel and image of any linear transformation are subspaces. Let 
us show that ker T is G-invariant: We want to show that gv € kerT if v € kerT, 
or that T(gv) = 0 if T(v) = 0. Well, 


T (gv) = gT(v) = g0 = 0. 
Similarly, if vo’ € im 7, then v’ = T(v) for some v € V. Then 


oO "gh (oy — (ge), 
so gv’ € imT too. o 


(9.6) Theorem. Schur’s Lemma: Let p,p' be two irreducible representations of G 
on vector spaces V, V’, and let 7; V—— V’ be a G-invariant transformation. 


(a) Either 7 is an isomorphism, or else T = 0. 
(b) If V = V’ and p = p’, then T is multiplication by a scalar. 


Proof. (a) Since p is irreducible and since ker T is a G-invariant subspace, 
ker T = Vorelse ker T = 0. In the first case, T = 0. In the second case, T is injec- 
tive and maps V isomorphically to its image. Then im T is not zero. Since p’ is irre- 
ducible and im T is G-invariant, im T = V'. Therefore T is an isomorphism. 


(b) Suppose V = V’, so that T is a linear operator on V. Choose an eigenvalue A of 
T. Then (T — Al) = T, is also G-invariant. Its kernel is nonzero because it contains 
an eigenvector. Since p is irreducible, ker 7, = V, which implies that 7, = 0. 
Therefore T = Al. o 


The averaging process can be used to create a G -invariant transformation from 
any linear transformation T: V——> V’. To do this, we rewrite the condition (9.1) in 
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the form T(v) = p,'~'(T(p,(v)), or 
(9.7) Fe) = f° (F (20). 
The average is the linear operator T defined by 


(9.8) T(v) = = “2 g(T(gv)), 


where \ = |G| as before. If bases for VV" are given and if the matrices for 
Pe» Pg',1,T are Re, Rp ,A,A respectively, then 


~ 


] = 


Since compositions of linear transformations and sums of linear transformations are 
again linear, 7 is a linear transformation. To show that it is G-invariant, we fix an 
element h € G and let g’ = gh. Reindexing as in the proof of Lemma (2.8). 


h-'T(hv) = 2 h>'g"'(T(ghv)) = >> g’(T(g'v)) = Fle). 
Therefore T(hv) = AT (tv). Since 4 is arbitrary, this shows that T is G-invariant. < 


_ It may happen that we end up with the trivial linear transformation, that is. 
T = 0 though 7 was not zero. In fact, Schur’s Lemma tells us that we must get 
T = Oif pand p’ are irreducible but not isomorphic. We will make good use of this 
seemingly negative fact in the proof of the orthogonality relations. 

When p = p’, the average can often be shown to be nonzero by using this 
proposition. 


(9.10) Proposition. Let p be a representation of a finite group G on a vector space 
V, and let T: V——>V be a linear operator. Define T_by formula (9.8). Then 
trace T = trace 7. Thus if the trace of T isn’t zero, then T is not zero either. 


Proof. We compute as in formula (9.9), with R’ = R. Since trace A = 
trace Rz'ARz, the proposition follows. o 


Here is a sample calculation. Let G = C; = {1,x, x7}, and let p = p’ be the 
regular representation (Section 6) of G, so'that V = C° and 


0 0 1 
R,=]1 0 0 
O21. 0 


Let T be the linear operator whose matrix is 


2 eG 
B=10 0 0 
0 0 0 
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Then the matrix of T is 


Pal 
ho 3 (BI + R,~'BR, + Ry ?BR,”) 


= Si | 


1 0 
| 5 , I 
= 38 + R2BR,y + RxBR) = 5 Dame ||| 
O° 
Or, let T be the linear operator whose matrix is the permutation matrix correspond- 


ing to the transposition y = (1 2). The average over the group is a sum of the three 
transpositions: (y + x7'yx + x7?yx)/3 = (y + xy + x’y)/3. In this case, 


\ oe | SG | eel et 
ae 1 0 0 and ee | Hl 
0 0 1 ele 


Note that 8 and P commute with R, as claimed [see (9.4)], though the original ma- 
trices P and B do not. 

We will now prove the orthogonality relations, Theorem (5.9a). We saw in 
Section 6 that formula (5.10) is a consequence of these relations. 

Let y, vy’ be two nonisomorphic irreducible characters, corresponding to rep- 
resentations p,p’ of G on V,V’. Using the rule y’(g'') = y'(g), we can rewrite 
the orthogonality (y’, x) = 0 to be proved as 


(9.11) ee a0 


Now Schur’s Lemma asserts that every G -invariant linear transformation V——> V’ 
is zero. In particular, the linear transformation T which we obtain by averaging any 
linear transformation T is zero. Taking into account formula (9.9), this proves the 
following lemma: 


(9.12) Lemma. Let R,R’ be nonisomorphic irreducible representations of G. Then 
> Roane = 0 
& 

for every matrix A of the appropriate shape. o 


Let’s warm up by checking orthogonality in the case that p and p’ have di- 
mension |. In this case, Ry, Re’ are | X 1 matrices, that is, scalars, and y(g) = Rg. 
If we set A = 1, then except for the factor 1/n, (9.12) becomes (9.11), and we are 
done. 

Lemma (9.12) also implies orthogonality in higher dimensions, but only after a 
small computation. Let us denote the entries of a matrix M by (M);;, as we did in 
Section 7 of Chapter 4. Then y(g) = traceRy = 2; (Re)ij. So (x, x) expands to 


(9.13) XX) = HDD Res Duley 
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We may reverse the order of summation. So to prove that (y’, vy) = 0, it suffices to 
show that for all i, /, 


(9.14) Dir, ke) = 


x 


The proof of the following lemma is elementary: 


(9.15) Lemma, Let .\ be matrices and let P = MeagN, where @ag is a matrix 
unit of suitable size. The entries of P are (P)j = (M)ia(N)gj. 0 


We substitute e, for A in Lemma (9.12) and apply Lemma (9.15), obtaining 
0= (O)ij = 3S (Re-1 €yRa)i = S (Re-1 Jil(Re)y, 
g & 


as required. This shows that (y’.x) = 0 if y and y’ are characters of nonisomor- 
phic irreducible representations. 

Next, suppose that y = x’. We have to show that (y, vy) = 1. Averaging A as 
in (9.9) need not give zero now, but according to Schur’s Lemma, it gives a scalar 
matrix: 


(9.16) Rake = Aa 


By Proposition (9.10), trace A = trace A, and trace A = da, where d = dim p. So 
(9.17) a = trace A/d. 
We set A = e; in (9.16) and apply Lemma (9.15) again, obtaining 


ae 


(9.18) (al)ij Sa AR = ve (Re-1)ii(Re) jj, 

£ 

where a = (trace e,,)/d. The left-hand side of (9.18) is zero if i # j and is equal to 
1/d if i = j. This shows that the terms with i # j in (9.13) vanish, and that 


(xex) = >) > Relea = SS, E eS (el = DS: l/d = 1. 
g t C g i 


This completes the proof that the irreducible characters y1, y2,... are orthonormal. 

We still have to show that the number of irreducible characters is equal to the 
number of conjugacy classes, or, equivalently, that the irreducible characters span 
the space € of class functions. Let the subspace they span be 2. Then [Chapter 7 
(2.15)] € = ¥BL'. So we must show that 4* = 0, or that a class function @ 
which is orthogonal to every character is zero. 

Assume a class function @ is given. So @ is a complex-valued function on G 
which is constant on conjugacy classes. Let y be the character of a representation p, 
and consider the linear operator 7: V——> V defined by 


(9.19) T= 5D 8) bs 
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Its trace is 

(9.20) trace T = 7 (g) x(g) = <d,x> = 0, 

because ¢ is orthogonal to y. 


(9.21) Lemma. The operator T defined by (9.19) is G -invariant. 


Proof. We have to show (9.2) pn° T = T° pn, or T = pn''° T° pn, for ev- 
ery h € G. Let g" = h“'gh. Then as g runs over the group G, so does g”, and of 
course pn 'pgph = Pg. Also }(g) = $(g") because ¢ is a class function. Therefore 


a lv za .- ly zm 
pa-'T pr = x Dy O(8) Ph 'PePh = xj 2, O(8") Pe = T, 
& & 
as required. o 


Now if p is irreducible as well, then Schur’s Lemma (9.6b) applies and shows 
that T = cl. Since trace T = 0 (9.20), it follows that T = 0. Any representation p 
is a direct sum of irreducible representations, and (9.19) is compatible with direct 
sums. Therefore T = 0 in every case. 

We apply this to the case that p = p™® is the regular representation. The vec- 
tor space is V(G). We compute 7 (1), where | denotes the identity element of G. By 
definition of the regular representation, p,(1) = g. So 


(9.22) 0=T() =D F@ee) = Fd We. 


Since the elements of G are a basis for V = V(G), this shows that @(g) = 0 for all 
g, hence that @ = 0. a 


10. REPRESENTATIONS OF THE GROUP SU, 


Much of what was done in Sections 6 to 9 carries over without change to continuous 
representations of compact groups G, once a translation-invariant (Haar) measure dg 
has been found. One just replaces summation by an integral over the group. How- 
ever, there will be infinitely many irreducible representations if G is not finite. 

When we speak of a representation p of a compact group, we shall always 
mean a continuous homomorphism to GL(V), where V is a finite-dimensional com- 
plex vector space. The character y of p is then a continuous, complex-valued func- 
tion on G, which is constant on each conjugacy class. (It is a class function.) 

For example, the identity map is a two-dimensional representation of SU2. Its 
character is the usual trace of 2 x 2 matrices. We will call this the standard repre- 
sentation of SU,. The conjugacy classes in SU, are the sets of matrices with given 
trace 2c. They correspond to the latitudes {x, = c} in the 3-sphere SU, [Chapter 8 
(2.8)]. Because of this, a classs function on SU, depends only on x,;. So such a func- 
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tion can be thought of as a continuous function on the interval [-1, 1]. In the nota- 
tion of Chapter 8 (2.5), the character of the standard representation of SU) is 


x(P) = trace P= a+ = 2x. 


Let |G| denote the volume of our compact group G with respect to the mea- 
sure dg: 


(10.1) icl= | 1 dg. 
G 
Then the hermitian form which replaces (5.8) is 


(10.2) (Xex') = raul x (2)x '(g) dg. 
G 


With this definition, the orthogonality relations carry over. The proofs of the fol- 
lowing extensions to compact groups are the same as for finite groups: 


(10.3) Theorem. 


(a) Every finite-dimensional representation of a compact group G is a direct sum 
of irreducible representations. 

(b) Schur’s Lemma: Let p, p' be irreducible representations, and let 7; V ——> \’' 
be a G-invariant linear transformation. Then either 7 is an isomorphism. or 
else T = 0. If p = p’, then T is multiplication by a scalar. 

(c) The characters of the irreducible representations are orthogonal with respect to 
the form (10.2). 

(d) If the characters of two representations are equal, then the representations are 
isomorphic. 

(e) A character y has the property (v, v) = 1 if and only if p is irreducible 

(f) If G is abelian, then every irreducible representation is one-dimensional 


However, the other parts of Theorem (5.9) do not carry over directly. The 
most significant change in the theory is in Section 6. If G is connected, it cannot 
operate continuously and nontrivially on a finite set, so finite-dimensional represen- 
tations can not be obtained from actions on sets. In particular, the regular represen- 
tation is not finite-dimensional. Analytic methods are needed to extend that part of 
the theory. 

Since a Haar measure is easy to find for the groups U; and SU,, we may con- 
sider all of (10.3) proved for them. 

Representations of the circle group U; are easy to describe. but they are funda- 
mental for an understanding of arbitrary compact groups. It will be convenient to 
use additive and multiplicative notations interchangeably: 


(10.4) SO1R)— U 


(rotation by @)~we” = a. 
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(10.5) Theorem. The irreducible representations of U; are the nth power maps: 
- 
Oia U1. 
sending am~a", or Qn. There is one such representation for every 
integer n. 


Proof. By (10.3f), the irreducible representations are all one-dimensional, 
and by (3.5), they are conjugate to unitary representations. Since GL; = C™ is abe- 
lian, conjugation is trivial, so a one-dimensional matrix representation is automati- 
cally unitary. Hence an irreducible representation of U, is a continuous homomor- 
phism from U, to itself. We have to show that the only such homomorphisms are the 
nth power maps. 


(10.6) Lemma. The continuous homomorphisms w: R* —~R* are multiplica- 
tion by a scalar: w(x) = cx, for some c € R. 


Proof. Let w: R* —> R®* be a continuous homomorphism. We will show that 
w(x) = xwW(1) for all x. This will show that w is multiplication by c = wW(1). 

Since w is a homomorphism, W(nr) = w(r + ++: + r) = nb(r), for any real 
number r and any nonnegative integer n. In particular, w(n) = nw(1). Also, 
W(-n) = -w(n) = —nw(1). Therefore w(n) = ny (1) for every integer n. Next we 
let r = m/n be a rational number. The ny (r) = ¢(nr) = w(m) = mp (1). Dividing 
iby n, we find w(r) = rw(1) for every rational number r. Since the rationals are 
dense in R and w is continuous, w(x) = cx for all x o 


(10.7) Lemma. The continuous homomorphisms ¢: R* ——= U, are of the form 
v(x) = e'* for some c € R. 


Proof. lf @ is differentiable, this can be proved using the exponential map 
of Section 5, Chapter 8. We prove it now for any continuous homomorphism. We 
consider the exponential homomorphism e: R* —— U, defined by €(x) = e”. This 
homomorphism wraps the real line around the unit circle with period 27 [see Fig- 
ure (10.8)]. For any continuous function g: R* —— U, such that ¢(0) = 1, there is 
a unique continuous lifting w of this function to the real line such that w(0) = 0. In 
other words, we can find a unique continuous function w: R——R such that 
wW(0) = 0 and g(x) = €(W(x)) for all x. The lifting is constructed starting with the 
definition (0) = 0 and then extending & a small interval at a time. 

We claim that if g is a homomorphism, then its lifting w is also a homomor- 
phism. If this is shown, then we will conclude that w(x) = cx for some c by (10.6), 
hence that g(x) = e'*, as required. 

The relation g(x+y) = g(x)p(y) implies that €(W(x+y)—W(x)—-(y)) = 1. 
Hence w(x + y) — &(x)b(y) = 27m for some integer m which depends continu- 
ously on x and y. Varying continuously, m must be constant, and setting x = y = 0 
shows that m = 0. So W is a homomorphism, as claimed. o 


Now to complete the proof of Theorem (10.5), let p: Uim— U, be a continu- 
ous homomorphism. Then g = p° e: R* —> U, is also a continuous homomor- 
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R* ) 


(10.8) Figure. 


phism, so ¢(x) = e"* by (10.7). Moreover, g (277) na ‘ which is the case if and 


only if ¢ is an integer, say n. Then p(e“) = e™ = (e* 


Now let us examine the representations of the group SU. Again, there is an 
infinite family of irreducible representations which arise naturally, and they turn out 
to form a complete list. Let V, be the set of homogeneous polynomials of degree n in 
variables u,v. Such a polynomial will have the form 


(10.9) Vay = xo x oe, 


where the coefficients x; are complex BEES. Obviously, V, is a vector space of di- 
mension n + 1, with basis (u",u""'v,...,u"). The group G = GL» operates on V, 
in the following way: Let P € GL), a 


i é b 
c dl 
Let P act on the basis (u,v) of V; as usual: 


(u’,v') = (u,v)P = (au + cv, bu + dv); 


define pn . by the rule 
(10.10) uipiawse uv and 
flu, v) wr xou'™ + xual lo! +o + onl 
This is a representation 
(10.11) Pn: G— > GL(V,) = GLa. 
The trivial representation is po, and the standard representation is py. 
For example, the matrix of p2p Is 

a° ab b? 
(10.12) R» = | 2ac ad + be 2bd |}. 

C cd a 


5 : : 2), K a Deere 
Its first column is the coordinate vector of p2»(u*) = (au + cv)? = a?u? + 
Jacur + c?c7. and so on. 
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(10.13) Theorem. The representations p, (n = 0, 1,2,...) obtained by restricting 
(10.11) to the subgroup SU> are the irreducible representations of SU2. 


Proof. We consider the subgroup T of SU2 of diagonal matrices 


a 
oe fe 


where a = e'. This group is isomorphic to U;. The conjugacy class of an arbitrary 
unitary matrix P contains two diagonal matrices, namely 


ee" 9 


where A, A are the eigenvalues of P [Chapter 7 (7.4)]. They coincide only when 
A = +1. So every conjugacy class except {/} and {—/} intersects T in a pair of ma- 
trices. 


(10.15) Proposition. 


(a) A class function on SU; is determined by its restriction to the subgroup T. 
(b) The restriction of a class function gy to T is an even function, which means that 


g(a) = g(a) or 9) = o(-8).5 


Next, any representation p of SU? restricts to a representation on the subgroup 7, 
and T is isomorphic to U,. The restriction to T of an irreducible representation of 
SU, will usually be reducible, but it can be decomposed into a direct sum of irre- 
ducible representations of 7. Therefore the restriction of the character y to T gives 
us a sum of irreducible characters on U,. Theorem (10.5) tells us what the irre- 
ducible characters of T are: They are the nth powers e'"®, n € Z. Therefore we find: 


(10.16) Proposition. The restriction to T of a character y on SU} is a finite sum of 
exponential functions e’”. 5 


Let us calculate the restriction to T of the character yn, of pn» (10.11). The ma- 
trix (10.14) acts on monomials by 
u'vi ws (aiu)(@v/) = ab Su'o!, 


Therefore, its matrix, acting on the basis (u", u”"'v,...,v0”), is the diagonal matrix 
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and the value of the character is 


(10.17) Xnla) = a" +a" 2+---+Q7" = eit + elt WW... 4 pind 
or 
(10.18) Xo = 1 


x = 2cosd =e” + e ® 
X2 = 142 cos 20 = e7+1+e6 7 
2 cos 36 + 2 cos @ 


= 
T 


Now let x’ be any irreducible character on SU3. Its restriction to T is even 
(10.15b) and is a sum of exponentials e'"® (10.16). To be even, e!”® and e“”® must 
occur with the si coefficient, so the character is a linear combination of the func- 


tions cos n@ = $(e'"+e ”*). The functions (10.17) form a basis for the vector 
space spanned ey {cos nO}. Therefore 
(10.19) x’ = Dm 


i 


where r; are rational numbers. A priori, this is true on 7, but by (10.15a) it is also 
true on all of SU2. Clearing denominators and bringing negative terms to the left in 
(10.19) yields a relation of the form 


(10.20) my’ + > njXj = »S NkXk, 
j k 


where nj, nx are positive integers and the index sets { j}, {k} are disjoint. This relation 
implies 


mp'® 2, njpj = 2, mpr. 
i 


Therefore p' is one of the representations px. This completes the proof of Theorem 
(10.13). o 


We leave the obvious generalizations to the reader. 


Israel Herstein 


EXERCISES 
1. Definition of a Group Representation 


1. Let p be a representation of a group G. Show that det p is a one-dimensional representa- 
tion. 
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2 


10. 


Z 


1. 


. Suppose that G is a group with a faithful representation by diagonal matrices. Prove tha 


G is abelian. 
Prove that the rule S,,—> R”* defined by p~~~ sign p is a one-dimensional representa- 
tion of the symmetric group. 


. Prove that the only one-dimensional representations of the symmetric group Ss are the 


trivial representation defined by p(g) = I for all g and the sign representation. 


. (a) Write the standard representation of the octahedral group O by rotations explicitly, 


choosing a suitable basis for R’. 

(b) Do the same for the dihedral group D,. 
*(c) Do the same for the icosahedral group /. 
ea = 
0 as 
a rotation in SO, is represented by its angle. 


Show that the rule 0 (@) = | i a =e", is arepresentation of SO., when 


. Let H be a subgroup of index 2 of a group G, and let p: G——>GLI(V) be a representa- 


tion. Define p’: G——> GL(V) by the rule p'(g) = p(g) if g © H, and p'(g) = —p(g) 
if g € H. Prove that p’ is a representation of G. 


. Prove that every finite group G has a faithful representation on a finite-dimensional com- 


plex vector space. 


. Let N be a normal subgroup of a group G. Relate representations of G/N to representa- 


tions of G. 


Choose three axes in R? passing through the vertices of a regular tetrahedron centered at 
the origin. (This is not an orthogonal coordinate system.) Find the coordinates of the 
fourth vertex, and write the matrix representation of the tetrahedral group T in this coor- 
dinate system explicitly. 


G-Invariant Forms and Unitary Representations 
(a) Verify that the form X*ByY (2.10) is G-invariant. 


(b) Find an orthonormal basis for this form, and determine the matrix P of change of ba- 
sis. Verify that PAP ' is unitary. 


. Prove the real analogue of (2.2): Let R: G——>GL,(R) be a representation of a finite 


group G. There is a P © GL,(R) such that PR,P"' is orthogonal for every g € G. 


. Let p: G—— SL,(R) be a faithful representation of a finite group by real 2 x 2 matrices 


of determinant 1. Prove that G is a cyclic group. 


. Determine all finite groups which have a faithful real two-dimensional representation. 
. Describe the finite groups G which admit faithful real three-dimensional representations 


with determinant 1. 


Let V be a hermitian vector space. Prove that the unitary operators on V form a subgrour 
U(V) of GL(V), and that a representation p on V has image in U(V) if and only if the 
form (,) is G -invariant. 


. Let (, ) be a nondegenerate skew-symmetric form on a vector space V, and let p be a rep- 


resentation of a finite group G on V. 
(a) Prove that the averaging process (2.7) produces a G -invariant skew-symmetric form 
on V. 


(b) Does this prove that every finite subgroup of GL2, is conjugate to a subgroup of 
SB 
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8. (a) Let R be the standard two-dimensional representation of D,. with the tnangle situated 
so that the x-axis is a line of reflection. Rewrite this representation in terms of the 
basis «' = candy’ = x + y. 
(b) Use the averaging process to obtain a G-invariant form from dot product in the 
(x‘, y’)-coordinates. 


3. Compact Groups 


= 


- Prove that dx/x is a Haar measure on the multiplicative group R*. 


wv 


. (a) Let P = ponte be a variable 2 x 2 matrix, and let dV = dpyjdnydps,dps denote 
ap 22 
the ordinary volume form on R°*?. Show that (det P) “dV is a Haar measure on 
GL,(R). 
(b) Generalize the results of (a). 


dx2dx3dXx4 
x 


*3. Show that the form on the 3-sphere defines a Haar measure on SC’). What re- 


places this expression at‘points where x, = 0? 
4. Take the complex representation of SO; in R* given by 


and reduce it to a unitary representation by averaging the hermitian product on R°. 


4. G-Invariant Subspaces and Irreducible Representations 


1. Prove that the standard three-dimensional representation of the tetrahedral group T is ir- 
reducible as a complex representation. 

2. Determine all irreducible representations of a cyclic group Cy. 

3. Determine the representations of the icosahedral group / which are not faithful. 

4. Let p be a representation of a finite group G on a vector space V and let c € V. 
(a) Show that averaging gt over G gives a vector T € V which is fixed by G. 
(b) What can you say about this vector if p is an irreducible representation? 

5. Let H C G be a subgroup, let p be a representation of G on V, and let v € V. Let 
w = Lye hv. What can you say about the order of the G -orbit of w? 

6. Consider the standard two-dimensional representation of the dihedral group Dy, as sym- 
metries of the n-gon. For which values of n is it irreducible as a complex representation? 

*7, Let G be the dihedral group D;, presented as in Chapter 5 (3.6). 
(a) Let p be an irreducible unitary representation of dimension 2. Show that there is an 
orthonormal basis of V such that Ry = se 

(b) Assume that R, is as above. Use the defining relations vx = x*y, x* = 1 to deter- 


mine the possibilities for Rx. 

(c) Prove that all irreducible two-dimensional representations of G are isomorphic. 

(d) Let p be any representation of G, and let v € V be an eigenvector for the operator 
px. Show that v is contained in a G-invariant subspace W of dimension = 2. 

(e) Determine all irreducible representations of G. 
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Characters 


. Corollary (5.11) describes a basis for the space of class functions. Give another basis. 


. Find the decomposition of the standard two-dimensional rotation representation of the 


cyclic group C,, by rotations into irreducible representations. 


. Prove or disprove: Let y be a character of a finite group G, and define y(g) = x(g). 


Then ¥ is also a character of G. 


Find the dimensions of the irreducible representations of the group O of rotations of a 
cube, the quaternion group, and the dihedral groups D,, Ds, and Deg. 


. Describe how to’produce a unitary matrix by adjusting the entries of a character table. 
. Compare the character tables for the quaternion group and the dihedral group D3. 
. Determine the character table for De. 


(a) Determine the character table for the groups Cs and Ds. 
(b) Decompose the restriction of each irreducible character of Ds into irreducible charac- 
ters of Cs. 


. (a) Let p be a representation of dimension d, with character y. Prove that the kernel of p 


*10. 


11 


12. 


13. 


is the set of group elements such that y(g) = d. 
(b) Show that if G has a proper normal subgroup, then there is a representation p such 
that ker p is a proper subgroup. 
Let y be the character of a representation p of dimension d. Prove that | x(g)| < d for all 
g &G, and that if | y(g)| = d, then p(g) = ¢1, for some root of unity ¢. 
Let G' = G/N be a quotient group of a finite group G, and let p’ be an irreducible rep- 
resentation of G’. Prove that the representation of G defined by p’ is irreducible in two 
ways: directly, and using Theorem (5.9). 
Find the missing rows in the character table below: 


(1) G) © © ©) 
1 b d 


x ] 1 1 1 ] 
x2 1 ff =) Il 1 
i | 3° SA it =i 0 
anes) el I 0 


The table below is a partial character table of a finite group, in which £ = 4(-1 + V3i) 
and y = 3(-1 + V7i). The conjugacy classes are all there. 


Os) a 


2x2 


oy 
y 


(a) Determine the order of the group and the number and the dimensions of the irre- 
ducible representations. 


(b) Determine the remaining characters. 
(c) Describe the group by generators and relations. 
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*14. Describe the commutator subgroup of a group G in terms of the character table. 
*15. Below is a partial character table. One conjugacy class is missing. 


(CO a ea eA 6) 


(a) Complete the table. 

(b) Show that wu has order 2, x has order 4, w has order 6, and v has order 3. Determine 
the orders of the elements in the missing conjugacy class. 

(c) Show that tv generates a normal subgroup. 

(d) Describe the group. 

*16. (a) Find the missing rows in the character table below. 

(b) Show that the group G with this character table has a subgroup H of order 10, and 
describe this subgroup as a union of conjugacy classes. 

(c) Decide whether H is Cio or Ds. 

(d) Determine the commutator subgroup of G. 

(e) Determine all normal subgroups of G. 

(f) Determine the orders of the elements a, b,c, d. 

(g) Determine the number of Sylow 2-subgroups and the number of Sylow 5-subgroups 
of this group. 


(meas) 6) en) 
1 b 


a 


I 
1 
| 
1 


#17. In the character table below, £ = 4(-1 + V3i). 


(1) Geom) OG) 1). @) 7) 
1 b f 
ve 1 1 1 1 1 1 
Pall 1 Wee. ee 
en! 1 i g 4 g g 
x4 1 1 | =e = g g 
Xs | | [eee OMe? lia 2 Ge 4 
xe | 1 1 -1l -1 -1 1 1 
piel Ome Om “O 0 


(a) Show that G has a normal subgroup N isomorphic to D;, and determine the structure 
of G/N. 
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6. 


*10. 


7. 


1. 


(b) Decompose the restrictions of each character to N into irreducible N-characters. 
(c) Determine the numbers of Sylow p-subgroups, for p = 2, 3, and 7. 
(d) Determine the orders of the representative elements c, d, e,f. 


Permutation Representations and the Regular 
Representation 


. Verity the values of the characters (6.4) and (6.5). 
. Use the orthogonality relations to decompose the character of the regular representation 


for the tetrahedral group. 


. Show that the dimension of any irreducible representation of a group G of order Nn > 1 is 


at most NV — I. 


. Determine the character tables for the nonabelian groups of order 12. 
. Decompose the regular representation of C; into irreducible real representations. 


Prove Corollary (6.8). 


. Let p be the permutation representation associated to the operation of D3 on itself by 


conjugation. Decompose the character of p into irreducible characters. 


. Let S be a G-set, and let p be the permutation representation of G on the space V(S). 


Prove that the orbit decomposition of S induces a direct sum decomposition of p. 


. Show that the standard representation of the symmetric group S, by permutation ma- 


trices is the sum of a trivial representation and an irreducible representation. 


Let H be a subgroup of a finite group G. Given an irreducible representation p of G, we 
may decompose its restriction to H into irreducible H-representations. Show that every 
irreducible representation of H can be obtained in this way. 


The Representations of the Icosahedrai Group 


Compute the characters y2, ¥4, ys of /, and use the orthogonality relations to determine 
the remaining character y3. 


. Decompose the representations of the icosahedral group on the sets of faces, edges, and 


vertices into irreducible representations. 


. The group S; operates by conjugation on tts subgroup As. How does this action operate 


on the set of irreducible representations of As? 


. Derive an algorithm for checking that a group is simple by looking at its character table. 
. Use the character table of the icosahedral group to prove that it is a simple group. 
. Let H be a subgroup of index 2 of a group G, and let 7: H——>GL(V) be a representa- 


tion. Let a be an element of G not in H. Define a conjugate representation 

o': H——>GL(V) by the rule o '(h) = o (a tha). 

(a) Prove that a’ is a representation of H. 

(b) Prove that if o is the restriction to H of a representation of G, then a’ is isomorphic 
tog. 

(c) Prove that if b is another element of G not in H, then the representation o”(h) = 
a (b hb) is isomorphic to a’. 


- (a) Choose coordinates and write the standard three-dimensional matrix representation 


of the octahedral group O explicitly. 
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(b) Identify the five conjugacy classes in O, and find the orders of its irreducible repre- 
sentations. 
(c) The group O operates on these sets: 
(i) six faces of the cube 
(ii) three pairs of opposite faces 
(iii) eight vertices 
(iv) four pairs of opposite vertices 
(v) six pairs of opposite edges 
(vi) two inscribed tetrahedra 
Identify the irreducible representations of O as summands of these representations, 
and compute the character table for O. Verify the orthogonality relations. 
(d) Decompose each of the representations (c) into irreducible representations. 
(e) Use the character table to find all normal subgroups of O. 
8. (a) The icosahedral group / contains a subgroup 7, the stabilizer of one of the cubes 
[Chapter 6 (6.7)]. Decompose the restrictions to T of the irreducible characters of /. 
(b) Do the same thing as (a) with a subgroup D; of /. 


9. Here is the character table for the group G = PSL2(F), with y = }(-1 + V7i], y' = 
(-1 — V7i). 


(1) (21) (24), (24) (42), (56) 


(a) Use it to give two different proofs that this group is simple. 
(b) Identify, so far as possible, the conjugacy classes of the elements 


Pik Pal 


and find matrices which represent the remaining conjugacy classes. 
(c) G operates on the set of one-dimensional subspaces of F? (F = F;). Decompose the 
associated character into irreducible characters. 


8 One-dimensional Representations 


1. Prove that the abelian characters of a group G form a group. 

2. Determine the character group for the Klein four group and for the quaternion group. 

3. Let A,B be matrices such that some power of each matrix is the identity and such that A 
and B commute. Prove that there is an invertible matrix P such that PAP' and PBP" are 
both diagonal. 

4. Let G be a finite abelian group. Show that the order of the character group is equal to the 
order of G. 
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Prove that the sign representation p~~~ sign p and the trivial representation are the only 

one-dimensional representations of the symmetric group Sn. 

Let G be a cyclic group of order n, generated by an element x, and let ¢ = e 

(a) Prove that the irreducible representations are po,...,Pn-1, Where px: Gare 6 i 
defined by px(x) = £*. 

(b) Identify the character group of G. 

(c) Verify the orthogonality relations for G explicitly. 


Qari fn 


. (a) Let g¢: G—>G’ be a homomorphism of abelian groups. Define an induced homo- 


morphism ¢: G' <— G between their character groups. 
(b) Prove that ¢ is surjective if y is injective, and conversely. 


9, Schur’s Lemma, and Proof of the Orthogonality Relations 


1. 


2. 


> 


10. 


- (a) Calculate the four-dimensional volume of the 4-ball of radius ry, BS 


Let p be a representation of G. Prove or disprove: If the only G-invariant operators on V 
are multiplication by a scalar, then p is irreducible. 

Let p be the standard three-dimensional representation of 7, and let p’ be the permuta- 
tion representation obtained from the action of T on the four vertices. Prove by averag- 
ing that p is a summand of p’. 


. Let p = p’ be the two-dimensional representation (4.6) of the dihedral group D3, and let 


ugh ‘ ; ; 
A= | Use the averaging process to produce a G-invariant transformation from 


left multiplication by A. 
bel =1 =[h=1 
(a) Show that Ry = Ry = | a 1 | defines a representation of D3. 
1 =I = 
(b) We may regard the representation p2 of (5.15) as a 1 X 1 matrix representation. Let 
T be the linear transformation C'——> C? whose matrix is (1,0, 0)'. Use the averag- 
ing method to produce a G-invariant linear transformation from 7, using p2 and the 
representation R defined in (a). 
(c) Do part (b), replacing p2 by p, and ps3. 
(d) Decompose R explicitly into irreducible representations. 


Representations of the Group SU, 


. Determine the irreducible representations of the rotation group SO3. 
- Determine the irreducible representations of the orthogonal group OQ). 
- Prove that the orthogonal representation SU,—— SO; is irreducible, and identify its 


character in the list (10.18). 


. Prove that the functions (10.18) form a basis for the vector space spanned by {cos 6}. 
- Left multiplication defines a representation of SU, on the space R* with coordinates 


X1,...,%4, as in Chapier 8, Section 2. Decompose the associated complex representation 
into irreducible representations. 


{xP +x2?+2x3°+x4? S r?}, by slicing with three-dimensional slices. , 
(b) Calculate the three-dimensional volume of the 3-sphere S*, again by slicing. It is 
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ba 


advisable to review the analogous computation of the area of a 2-sphere first. You 


should find < (volume of B*) = (volume of S*). If not, try again. 


Prove the orthogonality relations for the irreducible characters (10.17) of SU by integra- 
tion over S°. 


Miscellaneous Problems 


ule 


42% 


ba 5 


*4, 


hs 


5d 


Prove that a finite simple group which is not of prime order has no nontrivial representa- 
tion of dimension 2. 

Let H be a subgroup of index 2 of a finite group G, and let a be an element of G not in 
H, so that aH is the second coset of H in G. Let S: H——>GL, be a matrix representa- 
tion of H. Define a representation ind S: G——> GL, of G, called the induced represen- 
tation, as follows: 


(ind S)n = E —] (ind S)an = E Sat. 


(a) Prove that ind S is‘a representation of G. 

(b) Describe the character yinas of ind S in terms of the character ys of S. 

(c) If R: G— GL, is a representation of G, we may restrict it to H. We denote the re- 
striction by res R: H—~>GL,. Prove that res (ind S) ~ S®S', where 5S’ is the con- 
jugate representation defined by Sh’ = Sa-'ha- 

(d) Prove Frobenius reciprocity: ( Xinas,Xr) = (Xs, Xresk)- 

(e) Use Frobenius reciprocity to prove that if S and S’ are not isomorphic representa- 
tions, then the induced representation ind S of G is irreducible. On the other hand, if 
S =~ S', then ind S is a sum of two irreducible representations R, R’. 

Let H be a subgroup of index 2 of a group G, and let R be a matrix representation of G. 

Let R’ denote the conjugate representation, defined by Ry’ = Rg if g © H, and R,’ = 

-Ry otherwise. 

(a) Show that R’ is isomorphic to R if and only if the character of R is identically zero on 
the coset gH, where g € H. 

(b) Use Frobenius reciprocity to show that ind(res R) ~ R@R’. 

(c) Show that if R is not isomorphic to R’, then res R is irreducible, and if these two rep- 
resentations are isomorphic, then res R is a sum of two irreducible representations 
of H. 

Using Frobenius reciprocity, derive the character table of S, from that of A, when 

(a) n= 3, (b)n=4, (C)n=S. 

Determine the characters of the dihedral group D,, using representations induced from 

Cn. 


. (a) Prove that the only element of SU, of order 2 is —1. 


(b) Consider the homomorphism gy: SU,——> SO . Let A be an element of SU2 such that 
g(A) = A has finite order 7 in SO;. Prove that the order n of A is either n or 2n. Also 
prove that if 7 is even, then n = 2n. 

Let G be a finite subgroup of SU,, and let G = ¢(G), where yg: SU,—— SO; is the or- 

thogonal representation (Chapter 8, Section 3). Prove the following. 

(a) If |G| is even, then|G| = 2|G| and G = g'(G). 
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(b) Either G = ¢~'(G), or else G is a cyclic group of odd order. 

(c) Let G be a cyclic —_ of SU, of order n. Prove that G is conjugate to the group 

generated by $ cy where { = e27!/". 

(d) Show that if G is the group D», then G is the quaternion group. Determine the ma- 
trix representation of the quaternion group H as a subgroup of SU, with respect to a 
suitable orthonormal basis in C?. 

(e) If G = T, prove that G is a group of order 24 which is not isomorphic to the sym- 
metric group S,. 

*8. Let p be an irreducible representation of a finite group G. How unique is the positive 
definite G-invariant hermitian form? 
*9, Let G be a finite subgroup of GL,(C). Prove that if 2, tr g = 0, then Xp g = 0. 

*10. Let p: G——> GL/(V) be a two-dimensional representation of a finite group G, and as- 
sume that | is an eigenvalue of p, for every g © G. Prove that p is a sum of two one-di- 
mensional representations. 

*11. Let p: G——>GL,(C) be an irreducible representation of a finite group G. Given any rep- 
resentation 0: GL, GL(V) of GL, we can consider the composition-a © p as a rep- 
resentation of G. 

(a) Determine the character of the representation obtained in this way when a is left 
multiplication of GL, on the space C"“" of n X n matrices. Decompose a ° p into ir- 
reducible representations in this case. 

(b) Find the character of o © p when a is the operation of conjugation on M,(C). 
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Rings 


Bitte vegifB alles, was Du auf der Schule gelernt hast; 
denn Du hast es nicht gelernt. 


Emil Landau 


DEFINITION OF A RING 


The integers form our basic model for the concept of a ring. They are closed under 
addition, subtraction, and multiplication, but not under division. 

Before going to the abstract definition of a ring, we can get some examples by 
considering subrings of the complex numbers. A subring of C is a subset which is 
closed under addition, subtraction, and multiplication and which contains 1. Thus 
any subfield [Chapter 3 (2.1)] is a subring. Another example is the ring of Gauss in- 
tegers, which are complex numbers of the form a + bi, where a and Bb are integers. 
This ring is denoted by 


(1.1) Z[i] = {a + bila,b € Z}. 


The Gauss integers are the points of a square lattice in the complex plane. 

We can form a subring Z[a] analogous to the ring of Gauss integers, starting 
with any complex number a. We define Z[a] to be the smallest subring of C con- 
taining a, and we call it the subring generated by a. It is not hard to describe this 
ring. If a ring contains a, then it contains all positive powers of a because it is 
closed under multiplication. Also, it contains sums and differences of such powers, 
and it contains 1. Therefore it contains every complex number 6 which can be ex- 
pressed as a polynomial in a with integer coefficients: 


(1.2) B = ana" + +» + aja +a, wherea; € Z. 


On the other hand, the set of all such numbers is closed under the operations of addi- 
tion, subtraction, and multiplication, and it contains |. So it is the subring generated 
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by a. But Z[a] will not be represented as a lattice in the complex plane in most 
cases. For example, the ring Z[4] consists of the rational numbers which can be ex- 
pressed as a polynomial in 3 with integer coefficients. These rational numbers can be 
described simply as those whose denominator is a power of 2. They form a dense 
subset of the real line. 

A complex number a is called algebraic if it is a root of a polynomial with in- 
teger coefficients, that is, if some expression of the form (1.2) is zero. For example, 


i+ 3,1/7,7 + V2, and V3 + V-5 are algebraic numbers. 

If there is no polynomial with integer coefficients having a as a root, then a is 
called a transcendental number. The numbers e and 77 are transcendental, though it 
is not easy to prove that they are. If @ is transcendental, then two distinct polyno- 
mial expressions (1.2) must represent different complex numbers. In this case the el- 
ements of the ring Z[a] correspond bijectively to polynomials p(x) with integer 
coefficients, by the rule p(x) <—> p(a). 

When a is algebraic there will be many polynomial expressions (1.2) which 
represent the same complex number. For example, when a = i, the powers a” take 
the four values +1, +i. Using the relation i? = —1, every expression (1.2) can be 
reduced to one whose degree in i is <1. This agrees with the description aaa 
above for the ring of Gauss integers. 

The two kinds of numbers, algebraic and transcendental, are sasuaiean 
analogous to the two possibilities, finite and infinite, for a cyclic group [Chapter 2 
(2-7))- 

The definition of abstract ring is similar to that of field [Chapter 3 (2.3)], ex- 
cept that multiplicative inverses are not required to exist: 


(1.3) Definition. A ring R is a set with two laws of composition + and x, called 
addition and multiplication, which satisfy these axioms: 


(a) With the law of composition +, R is an abelian group, with identity denoted 
by 0. This abelian group is denoted by R*. 

(b) Multiplication is associative and has an identity denoted by 1. 

(c) Distributive laws: For all a, b, c, € R, 


(a+ b)c =ac+be and c(at+ b)=ca+ cb. 


A subring of a ring is a subset which is closed under the operations of addition, sub- 
traction, and multiplication and which contains the element 1. 


The terminology used is not completely standardized. Some people do not re- 
quire the existence of a multiplicative identity in a ring. We will study commutative 
rings in most of this book, that is, rings satisfying the commutative law ab = ba for 
multiplication. So let us agree that the word ring will mean commutative ring with 
identity, unless we explicitly mention noncommutativity. The two distributive laws 
(c) are equivalent for commutative rings. 

The ring R”’*" of all n X n matrices with real entries is an important example 
of a ring which is not commutative. 
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Besides subrings of C, the most important rings are polynomial rings. Given 
any ring RK, a polynomial in x with coefficients in R is an expression of the form 


(1.4) pe a ax + a, 


with a; © R. The set of these polynomials forms a ring which is usually denoted by 
R[x]. We will discuss polynomial rings in the next section. 
Here are some more examples of rings: 


(1.5) Examples. 


(a) Any field is a ring. 
(b) The set & of continuous real-valued functions of a real variable x forms a ring. 
with addition and multiplication of functions: 


[Lf + glx) = f(x) + g(x) and [fe](x) = f(x)g(x). 


(c) The zero ring R = {0} consists of a single element 0. 


In the definition of a field [Chapter 3 (2.3)], the multiplicative identity | is re- 
quired to lie in F~ = F — {0}. Hence a field has at least two distinct elements, 
namely | and 0. The relation 1 = O has not been ruled out in a ring, but it occurs 
only once: 


(1.6) Proposition. Let R be a ring in which | = 0. Then R is the zero ring. 


Proof. We first note that Oa = 0 for any element a of a ring R. The proof 1s 
the same as for vector spaces [Chapter 3 (1.6a)]. Assume that | = 0 in R, and let a 
be any element of R. Then a = la = Oa = 0. So every element of R is 0, which 
means that RF is the zero ring. o 


Though multiplicative inverses are not required to exist in a ring, a particular 
element may have an inverse, and the inverse is unique if it exists. Elements which 
have multiplicative inverses are called units. For example, the units in the ring of in- 
tegers are 1 and -1, and the units in the ring R[x] of real polynomials are the 
nonzero constant polynomials. Fields are rings which are not the zero ring and in 
which every nonzero element is a unit. 

The identity element 1 of a ring is always a unit, and any reference to “the” 
unit element in R refers to the identity. This is ambiguous terminology, but it 1s too 
late to change it. 


2. FORMAL CONSTRUCTION OF INTEGERS AND POLYNOMIALS 


We learn that the ring axioms hold for the integers in elementary school. However. 
let us look again in order to see what is required in order to write down proofs of 
properties such as the associative and distributive laws. Complete proofs require a 
fair amount of writing, and we will only make a start-here. It is customary to begin 
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by defining addition and multiplication for positive integers. Negative numbers are 
introduced later. This means that several cases have to be treated as one goes along, 
which is boring, or else a clever notation has to be found to avoid such a case analy- 
sis. We will content ourselves with a description of the operations on positive in- 
tegers. Positive integers are also called natural numbers. 

The set N of natural numbers is characterized by these properties, called 
Peano’s axioms: 


(251) 


(a) The set N contains a particular element 1. 

(b) Successor function: There is a map 0: N——N that sends every integer 
n & N to another integer, called the next integer or successor. This map ts in- 
jective, and for every n € N, a(n) # 1. 

(c) Induction axiom: Suppose that a subset S of N has these properties: 

ai 1ES; 
(ii) ifn € S then a(n) € S. 
Then S contains every natural number: S = N. 


The next integer o(n) will turn into n + | when addition is defined. At this stage 
the notation n + | could be confusing. It is better to use a neutral notation, and we 
will often denote the successor by n’ [= a (n)]. Note that o is assumed to be injec- 
tive, so if m,n are distinct natural numbers, that is, if m # n, then m’,n’ are dis- 
tinct too. . 

The successor function allows us to use the natural numbers for counting, 
which is the basis of arithmetic. 

Property (c) is the induction property of the integers. Intuitively, it says that 
the natural numbers are obtained from | by repeatedly taking the next integer: 
N= {1,17,1",...} (= {1,2,3,...}), that is, counting runs through all natural num- 
bers. This property is the formal basis of induction proofs. 

Suppose that a statement P,, is to be proved for every positive integer n, and let 
S be the set of integers n such that P, is true. To say that P, is true for every n is the 
same as saying that S = N. For this set S, the Induction Axiom translates into the 
usual induction steps: 


(2.2) (i) P, is true; 
(ii) if P, is true then P,, is true. 


We can also use Peano’s axioms to make recursive definitions. The phrase re- 
cursive definition, or inductive definition, refers to the definition of a sequence of ob- 
jects Cn indexed by the natural numbers in which each object is defined in terms of 
the preceding one. The function C, = x” is an example. A recursive definition of | 


this function is 


n 


x' =x and x" =x"x. 
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The important points are as follows: 


(2.3) (i) C, is defined; 
(ii) a rule is given for determining C, (= C,+,) from Cy. 


It is intuitively clear that (2.3) determines the sequence C,, uniquely. though to 
prove this from Peano’s axioms is tricky. A natural approach to proving it would be 
as follows: Let S be the set of integers n such that (2.3) determines C, for every 
k =n. Then (2.31) shows that 1 € S. Also, (2.3ii) shows that if n € S then 
n' € S. The Induction Axiom shows that § = N, hence that C,, is uniquely defined 
for each n. Unfortunately, the relation < is not included in Peano’s axioms, so it 
must be defined and its properties derived to start. A proof based on this approach ts 
therefore lengthy, so we won’t carry one out here. 

Given the set of positive integers and the ability to make recursive definitions, 
we can define addition and multiplication of positive integers as follows: 


(2.4) Addition: m+1=m' and m+t+n' =(m + n)' 
Multiplication: m: 1 =m and m:n'=m-:n+m. 


In these definitions, we take an arbitrary integer m and then define addition and 
multiplication for that integer m and for every n recursively. In this way, m + n and 
m + n are defined for all m and n. 

The proofs of the associative, commutative, and distributive laws for the in- 
tegers are exercises in induction which might be called “Peano playing.” We will 
carry out two of the verifications here as samples. 


Proof of the associative law for addition. We are to prove that (a + b) +n = 
a + (b + n) for all a, b, n © N. We first check the case n = 1 for all a, b. Three 
applications of definition (2.4) give 


(a+ b)+1=(@+ db) =at+b' =at (b+ 1). 


Next, assume the associative law true for a particular value of n and for all a, b. 
Then we verify it for n’ as follows: 


(a+b) +n' = (a+b) + (n+ 1) (definition) 
=((a+b)+n)+1 = (casen= 1) 

(a + (b + n)) + 1. (induction hypothesis) 

a+((b+n)+ 1) (case n= 1) 

at+(b+(n+1)) (casen = 1) 

=aqa+(b +n’) (definition). a 


Proof of the commutative law for multiplication, assuming that the commutative 
law for addition has been proved. We first prove the following lemma: 


if 


(2.5) m'-n=m-ntn. 
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The case n = | isclear:m’-1=m'=m+1=m-1 + 1. So assume that (2.5) 
is true for a particular n and for all values of m. We check it for n': 


m'-n'=m'-n+m' =m'-n+(m+ 1) (definition) 


= (m-n+n) + (m+ 1) (induction) 
= (m-n+m) + (n+ 1) (various laws for addition) 
=m-n' +n’ (definition). 


Next, we check that 1 - = n by induction on n. Finally, we show that m-n = 
n+ m by induction on n, knowing that m- 1 = m = 1 - m: Assume it true for n. 
Thenm-n'’=m-n+m=n-m+m=n'-m, as required. o 


The proofs of other properties of addition and multiplication follow similar lines. 


We now turn to the definition of polynomial rings. We can define the notion of 
a polynomial with coefficients in any ring R to mean a linear combination of powers 
of the variable: 


(2.6) f(x) = ann" dank” Prax + ao, 


where a; © R. Such expressions are often called formal polynomials, to distinguish 
them from polynomial functions. Every formal polynomial with real coefficients de- 
termines a polynomial function on the real numbers. 

The variable x appearing in (2.6) is an arbitrary symbol, and the monomials x‘ 
are considered independent. This means that if 


g(x) = bmx™ + bmx | + eee + Dix + do 


is another polynomial with coefficients in R, then f(x) and g(x) are equal if and only 
if a; = b; for all i = 0, 1, 2,.... 

The degree of a nonzero polynomial is the largest integer k such that the 
coefficient a, of x* is not zero. (The degree of the zero polynomial is considered in- 
determinate.) The coefficient of highest degree of a polynomial which is not zero is 
called its leading coefficient, and a monic polynomial is one whose leading 
coefficient is 1. 

The possibility that some of the coefficients of a polynomial may be zero cre- 
ates a nuisance. We have to disregard terms with zero coefficient: x* + 3 = 
Ox’ + x* + 3, for example. So the polynomial f(x) has more than one representa- 
tion (2.6). One way to standardize notation is to list the nonzero coefficients only, 
that is, to omit from (2.6) all terms Ox‘. But zero coefficients may be produced in the 
course of computations, and they will have to be thrown out. Another possibility is 
to insist that the highest degree coefficient a, of (2.6) be nonzero and to list all those 
of lower degree. The same problem arises. Such conventions therefore require a dis- 
cussion of special cases in the description of the ring structure. This is irritating, be- 
cause the ambiguity caused by zero coefficients is not an interesting point. 

One way around the notational problem is to list the coefficients of all mono- 
mials, zero or not. This isn’t good for computation, but it allows efficient 
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verification of the ring axioms. So for the purpose of defining the ring operations, 
we will write a polynomial in the standard form 


(27) F(x) = ao + ayx + ax? + - 


where the coefficients uv are all in the ring R and only finitely many of the 
coefficients are different from cero. Formally, the polynomial (2.7) is determined by 
its vector (or sequence) of coefficients a;: 


(2.8) a = (do, Q),...), 


where a; € K and all but a finite number of a; are zero. Every such vector corre- 
sponds to a polynomial. In case R is a field, these infinite vectors form the vector 
space Z with the infinite basis e; which was defined in Chapter 3 (5.2d). The vector 
e; corresponds to the monomial x', and the monomials form a basis of the space ot 
all polynomials. 

Addition and multiplication of polynomials mimic the familiar operations on 
real polynomial functions. Let f(x) be as above, and let 


(2.9) g(x) ae bo ara b\x sP box? + -:- 


be another polynomial with coefficients in the same ring R, determined by the vec- 
tor b = (bo, bi ,...). The sum of f and g is 


(2.10) f(x) + g(x) = (ao + bo) + (as + bi)x + (a, + b2)x? + + 
= Dax + by)x*, 
k 


which corresponds to vector addition: a + b = (ao + bo, a; + by,...). 

The product of two polynomials f, g is computed by multiplying term by term 
and collecting coefficients of the same degree in x. If we expand the product using 
the distributive law, but without collecting terms, we obtain 


Qa) Fove) = Deaibx 


iy 


Note that there are finitely many nonzero coefficients a;b;. This is a correct formula, 
but the right side is not in the standard form (2.7) because the same monomial x” 
appears many times—once for each pair /, j of indices such that i+ j =n. So 
terms have to be collected to put the right side back into standara form. This leads to 


the definition 
f(x)g(x) = po + pix + pox? +>, 


where 


(712) Pe = adobe + arby-) + + axbo = >) aij. 


i+j=k 


However, it may be desirable to defer the collection of terms for a while when mak- 
ing computations. 
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(2.13) Proposition. There is a unique commutative ring structure on the set of 
polynomials R[x] having these properties: 


(a) Addition of polynomials is vector addition (2.10). 

(b) Multiplication of monomials is given by the rule (2.12). 

(c) The ring R is a subring of R[x], when the elements of R are identified with the 
constant polynomials. 


The proof of this proposition is notationally unpleasant without having any interest- 
ing features, so we omit it. 5 


Polynomials are fundamental to the theory of rings. and we must also consider 
polynomials, such as x*y* + 4x* — 3x*y — 4y° + 2, in several variables. There is 
no major change in the definitions. 

Let x1,...,X, be variables. A monomial is a formal product of these variables, 
ot the form 
Y> erate Ne 


where the exponents /, are nonnegative integers. The n-tuple (i,,... in) of exponents 
determines the monomial. Such an n-tuple is called a multi-index, and vector nota- 
tion i? = (/,...,é,) for multi-indices is very convenient. Using it, we may write the 
monomial symbolically as 


(2.14) en yo 2 so xin. 


The monomial x°, where 0 = (0,...,0), is denoted by 1. 

A polynomial with coefficients in a ring R is a finite linear combination of mo- 
nomials, with coefficients in R. Using the shorthand notation (2.14), any polynomial 
f(x) = f(x,...,.Xn) can be written in exactly one way in the form 


(2.15) fix) = > ax’, 


where i runs through all multi-indices (/),...,i,), the coefficients a; are in R, and 
only finitely many of these coefficients are different from zero. 

A polynomial which is the product of a monomial by a nonzero element of R is 
also called a monomial. Thus 


(2.17) m = rx! 


is a monomial if r € R is not zero and if x‘ is as above (2.14). A monomial can be 
thought of as a polynomial which has exactly one nonzero coefficient. 

Using multi-index notation, formulas (2.10) and (2.12) define addition and 
multiplication of polynomials in several variables, and the analogue of Proposition 
(2.13) is true. 

The ring of polynomials with coefficients in R is denoted by one of the sym- 
bols 


(2.16) R[x1,...,%n] or R[x], 
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where the symbol + is understood to refer to the set of variables (41,....- vn). When 
no set of variables has been introduced, R[x] refers to the polynomial ring in one 
variable x. 


3. HOMOMORPHISMS AND IDEALS 


A homomorphism ¢: R—- R' from one ring to another is a map which is compat- 
ible with the laws of composition and which carries | to 1, that is, a map such that 


(3.1) gla+t b) = gla) + vb), glab) = glaje(b), g(r) = Ir, 


for all a,b € R. An isomorphism of rings is a bijective homomorphism. If there is 
an isomorphism R—— R’, the two rings are said to be isomorphic. 

A word about the third part of (3.1) is in order. The assumption that a homo- 
morphism ¢g is compatible with addition implies that it is a group homomorphism 
R* ——~R'*. We know that a group homomorphism carries the identity to the iden- 
tity, so f(O) = 0. But R is not a group with respect to X, and we can't conclude that 
¢g(l) = 1 from compatibility with multiplication. So the condition ¢(1) = | must 
be listed separately. For example, the zero map R—— R’ sending all elements of R 
to zero is compatible with + and X, but it doesn’t send | to I unless | = 0 in R’. 
The zero map isn’t a ring homomorphism unless R' is the zero ring [see (1.6)]. 

The most important ring homomorphisms are those obtained by evaluating 
polynomials. Evaluation of real polynomials at a real number a defines a homomor- 


phism 

(3.2) R[x] —— R, sending p(x) ~~» p(a). 

We can also evaluate real polynomials at a complex number such as /, to obtain a ho- 
momorphism 

(333) R[x] —— C. sending p(x) ~~ p(i). 


The general formulation of the principle of evaluation of polynomials is this: 


(3.4) Proposition. Substitution Principle: Let ¢: R——>R° be a ring homomor- 
phism. 


(a) Given an element a € R’, there is a unique homomorphism ®: R{x]—— R' 
which agrees with the map gy on constant polynomials and which sends 
xX a. 

(b) More generally, given elements q,..., an © R’, there is a unique homomor- 
phism ®: R[x,,...,x,]——> R’ from the polynomial ring in n variables to R'’, 
which agrees with g on constant polynomials and which sends x,.~~~ @,. tor 
v= ],...,a. 


Proof. With vector notation for indices. the proof of (b) is the same as that of 
(a). Let us denote the image of an element r © R inR’ by r’. Using the fact that <P 
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is a homomorphism which restricts to ¢ on R and sends x, to a@,, we find that it acts 
on a polynomial f(x) = 2 r.x' by sending 


(3.5) Ss a www > el(rija! —y ne 


In other words. ® acts on the coefficients of a polynomial as gy, and it substitutes a 
for x. Since this formula describes @P for us, we have proved the uniqueness of the 
substitution homomorphism. To prove its existence, we take this formula as the 
definition of ®, and we show that this map is a homomorphism R[x]—— R’. It is 
easy to show that ® sends | to f and that it is compatible with addition of polynomi- 
als. Compatibility with multiplication can be checked using formula (2.11): 


D( fg) = BCD, aibjx') = DY Plaibjx') = D ai’ bj al 
ey | 


= (2 ai'a')(D, bya!) = B( f)®(g). « 


Here is an example of the Substitution Principle in which the coefficient ring R 
changes: Let w: R——> R, be a ring homomorphism. Composing w with the inclu- 
sion of R, as a subring of Ri[x], we obtain a homomorphism y: R——> R,[x]. The 
Substitution Principle asserts that there is a unique extension of ¢ to a homomor- 
phism ®: R[x]—> RiLx| which sends +» x. This is the map which operates on 
the coefficients of a polynomial. leaving the variable x fixed. If we denote w(a) by 
a’. then it sends a polynomial ay,x” + +++ + ayx + do tO an’ x" + ++ + a;'x + ao’. 

An important case is the homomorphism Z——> Fp, where F, = Z/pZ is the 
field with p elements. This map extends to a homomorphism 


(3.6) 2Z({x|—— F,[x], sending 
Of (K) = anx” +o + ag~wrGnx" +o + GH = f(a), 


where a, denotes the residue class of a; modulo p. It is natural to call the polynomial 
f(x) the residue of f(x) modulo p. 

The Substitution Principle is also an efficient way to prove that various con- 
structions of polynomial rings are equivalent; the isomorphism 


Rix, y] ~ RoI] 


is a typical example. Here the right side stands for the ring of polynomials in y 
whose coefficients are polynomials in x. The statement that these rings are isomor- 
phic is a formalization of the procedure of collecting terms of like degree in y in a 
polynomial f(x, v), to write it as a a in y. For example, 


SE yea ee =4)y? — (x2)y: 4s + 2). 
(3.7) Corollary. Let x = (%,...,%m) and y = (y1,..., yn) denote sets of variables. 


There is a unique isomorphism R[x, y]—>R[x]Ly] which is the identity on R and 
which sends the variables to themselves. 
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Proof. Note that R is a subring of R[Lx], and that R[x] is a subring of R[x][y]. 
So R 1s also a subring of R[x] y]. Consider the inclusion map gy: R—> R[x][yv]. The 
Substitution Principle (3.4) tells us that there is a unique homomorphism ®: 
R(x, ¥]——= R[x]Ly] which extends this map and sends the variables x,,, y, wherever 
we like. So we can send the variables to themselves. The map ® thus constructed is 
the required isomorphism. We can show that it has an inverse by using the Substitu- 
tion Principle once more: We note that R[x] is a subring of R[x, y], so we can extend 
the inclusion map w: R[x] —> R[x, y] toa map V R[x][y]—— R[x, y] by sending 
y, to itself. The composed homomorphism ¥®: R[x, y]——> R[x, y] is the identity 
on R and on {x,.¥,}. By the uniqueness of the substitution homomorphism, V® is 
the identity map. Similarly, ®YW is the identity. This proves that ® is an isomor- 
phism. al 


Since a real polynomial f(x) can be evaluated at a real number, it defines a 
polynomial function on the real line. The term polynomial is often used to refer to a 
function obtained in this way, and not much danger is involved in doing this, be- 
cause we can recover the polynomial from its function: 


(3.8) Proposition. Let denote the ring of continuous real-valued functions on 
R”. The map ¢: R[x....,x,]—> R sending a polynomial to its associated polyno- 
mial function is an injective homomorphism. 


Proof. The existence of this homomorphism follows from the Substitution 
Principle. Let us prove injectivity. It is enough to show that if the function associ- 
ated to a polynomial f(x) is the zero function, then f(x) is the zero polynomial. Let 
the associated function be f(x). If f(x) is identically zero, then all its derivatives are 
zero too. On the other hand, we can differentiate a formal polynomial by using the 
rule for differentiating polynomial functions. If some coefficient of our polynomial f 
is not zero, then the constant term of a suitable derivative will be nonzero too. So 
that derivative will not vanish at the origin. Therefore f(x) can’t be the zero func- 
tion. a 


Another important example of a ring homomorphism is the map from the in- 
tegers to an arbitrary ring: 
(3.9) Proposition. There is exactly one homomorphism 
go: Z——>R 


from the ring of integers to an arbitrary ring R. It is the map defined by y(n) = 
“n times Ie” = Ir + +: + x(n times) ifm > 0, and g(-—n) = -—@(n). 


Sketch of Proof. Let g: Z—— R be a homomorphism. By the definition of ho- 
momorphism, ¢(1) = Ir, and y(n + 1) = ¢(n) + g(1). So ¢ is determined on 
the natural numbers by the recursive definition 


g(l)= 1 and gin’) = ¢g(n) + I, 
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where ’ denotes the successor function (2.1b). This formula, together with g(-n) = 
—o(n) ifn > Oand y(0) = 0, determines y uniquely. So the above map is the only 
possible one. To give a formal proof that this map is a homomorphism, we must go 
back to Peano’s axioms. Let us verify that g is compatible with addition of positive 
integers. To prove that p(m + n) = g(m) + g(n), we note that this is true when 
n = |, by the definition of g. Assume it true for all m and some particular n. Then 
we prove it for all m and for n’: 


‘p((m + n) + 1) 


properties of addition of integers) 


definition of ¢) 


g(m + n') 


It 


go(m +n) +1 


It 


( 
( 

y(m) + p(n) + 1 (induction hypothesis) 
( 


It 


g(m) + p(n’) definition of ¢). 


By induction, ¢(m + n) = p(m) + y(n) for all m and n. We leave the proof of 
compatibility with multiplication of positive integers as an exercise. o 


This proposition allows us to identify the images of the integers in an arbitrary 
ring R. Thus we can interpret the symbol 3 as the element 1 + 1 + | in R, and we 
can interpret an integer polynomial such as 3x° + 2x as an element of the polyno- 
mial ring R[x]. 

We now go back to an arbitrary ring homomorphism gy: R—> R’. The kernel! 
of ¢ is defined in the same way as the kernel of a group homomorphism: 


ker p = {a € R | g(a) = O}. 
As you will recall, the kernel of a group homomorphism is a subgroup, and in addi- 
tion it is normal [Chapter 2 (4.9)]. Similarly, the kernel of a ring homomorphism is 


closed under the ring operations of addition and multiplication, and it also has a 
Stronger property than closure under multiplication: 


(3.10) ; Ifa © ker gandr € R, then ra E ker ¢. 


For if g(a) = 0, then g(ra) = g(r)p(a) = ¢g(r)0 = 0. On the other hand, ker ¢ 
does not contain the unit element | of R, and so the kernel is not a subring, unless it 
is the whole ring R. (If 1 © ker gy, thenr = rl € ker g for all r © R.) Moreover, 
if ker g = R, then ¢ ts the zero map, and by what was said above, R’’ is the zero 
ring. 

For example, let ¢ be the homomorphism R[.x]—— R defined by evaluation at 
the real number 2. Then ker ¢ Is the set of polynomials which have 2 us a root. It 
can also be described as the set of polynomiais divisible by x — 2. 

The property of the kernel of a ring homomorphism—that it is closed under 
multiplication by arbitrary elements of the ring—is abstracted in the concept of an 
ideal. An ideal / of a ring R is, by definition, a subset of R with these properties: 


(enh) 


(i) J is a subgroup of R*: 
(i) Ta eGwand; eR. thenv7aerr 
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This peculiar term “ideal” is an abbreviation of “ideal element,” which was formerly 
used in number theory. We will see in Chapter 11 how the term arose. Property (ii) 
implies that an ideal is closed under multiplication, but it is stronger. A good way to 
think of properties (1) and (ii) together is this equivalent formulation: 


(3.12) I is not empty, and a linear combination ria, + «++ + rea 
of elements a; © I with coefficients r; © R is in I. 


In any ring R, the set of multiples of a particular element a, or equivalently, 
the set of elements divisible by a, forms an ideal called the principal ideal generated 
by a. This ideal will be denoted in one of the following ways: 


(3.13) (a) = aR = Ra = {ra|r E R}. 


Thus the kernel of the homomorphism R[x]——> R defined by evaluation at 2 may 
be denoted by (x — 2) or by (x — 2)R[x]. Actually the notation (a) for a principal 
ideal, though convenient, is ambiguous because the ring is not mentioned. For 
instance, (x — 2) may stand for an ideal in R[x] or in Z[x], depending on the cir- 
cumstances. When there are several rings around, a different notation may be 
preferable. 

We may also consider the ideal / generated by a set of elements a),..., Qn of R, 
which is defined to be the smallest ideal containing the elements. It can be described 
as the set of all linear combinations 


(3.14) NiGin tt Eada, 


with coefficients r; in the ring. For if an ideal contains a),..., an, then (3.12) tells us 
that it contains every linear combination of these elements. On the other hand, the 
set of linear combinations is closed under addition, subtraction, and multiplication 
by elements of R. Hence it is the ideal /. This ideal is often denoted by 


(3.15) ae oe Pager 


For example, if R is the ring Z[x] of integer polynomials, the notation (2, x) 
stands for the ideal of linear combinations of 2 and x with integer polynomial 
coefficients. This ideal can also be described as the set of all integer polynomials 
f(x) whose constant term is divisible by 2. It is the kernel of the homomorphism 
Z[x]|——> Z/2Z defined by f(x)» (residue of f(0) (modulo 2)). 

For the rest of this section, we will describe ideals in some simple cases. In 
any ring R, the set consisting of zero alone is an ideal, called the zero ideal. It is ob- 
viously a principal ideal, as is the whole ring. Being generated as an ideal by the ele- 
ment 1, R is called the unit ideal, often denoted by (1). The unit ideal is the only 
ideal which contains a unit. An ideal / is said to be proper if it is not (O) or (1). 

Fields can be characterized by the fact that they have no proper ideals: 


(3.16) Proposition. 


(a) Let F be a field. The only ideals of F are the zero ideal and the unit ideal. 
(b) Conversely, if a ring R has exactly two ideals, then R is a field. 
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Let us prove (b). Assume that R has exactly two ideals. The properties that distin- 
guish fields among rings are that | # O and that every nonzero element a € R has a 
multiplicative inverse. As we saw above, | = 0 occurs only in the zero ring, which 
has one element. This ring has only one ideal. Since our ring has two ideals, 1 # 0 
in R. The two ideals (1) and (0) are different, so they are the only two ideals of R. 

We now show that every nonzero element of R has an inverse. Leta € R bea 
nonzero element, and consider the principal ideal (a). Then (a) # (0) because 
a € (a). Therefore (a) = (1). This implies that | is a multiple, say ra, of a. The 
equation ar = | shows that a has an inverse. o 


(3.17) Corollary. Let F be a field and let R’ be a nonzero ring. Every homomor- 
phism gy: F——> R’ is injective. 


Proof. We apply (3.16). If ker g = (1), then g is the zero map. But the zero 
map isn’t a homomorphism because R’ isn’t the zero ring. Therefore ker g = (0). 0 


It is also easy to determine the ideals in the ring of integers. 


(3.18) Proposition. Every ideal in the ring Z of integers is a principal ideal. 


This is because every subgroup of the additive group Z* of integers is of the form 
nZ [Chapter 2 (2.3)], and these subgroups are precisely the principal ideals. 3 


The characteristic of a ring R is the nonnegative integer n which generates the 
kernel of the homomorphism y: Z——> R (3.9). This means that n is the smallest 
positive integer such that “n times |r” = 0 or, if the kernel is (0), the characteristic 
is zero (see Chapter 3, Section 2). Thus R, C, and Z have characteristic zero, while 
the field F, with p elements has characteristic p. 

The proof that every ideal of the ring of integers is principal can be adapted to 
show that every ideal in the polynomial ring F [x] is principal. To prove this, we 
need division with remainder for polyomials. 


(3.19) Proposition. Let R be a ring and let f, g be polynomials in R[x]. Assume 
that the leading coefficient of f is a unit in R. (This is true, for instance, if f is a 
monic polynomial.) Then there are polynomials g,r © R[x] such that 


8 (x) = f(x)q(x) + rx), 
and such that the degree of the remainder r is less than the degree of f or else r = 0. 
This division with remainder can be proved by induction on the degree of g. c 
Note that when the coefficient ring is a field, the assumption that the leading 


coefficient of fis a unit is satisfied, provided only that there is a leading coefficient. 
that is, that f # 0. 


(3.20) Corollary. Let g(x) be a monic polynomial in R[x], and let @ be an ele- 
ment of R such that g(a) = 0. Then x — a divides g in R[x]. o 
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(3.21) Proposition. Let F be a field. Every ideal in the ring F [x] of polynomials 
in a single variable x is a principal ideal. 


Proof. Let I be an ideal of F [x]. Since the zero ideal is principal, we may as- 
sume that / # (0). The first step in finding a generator for a nonzero subgroup of Z 
is to choose its smallest positive element. Our substitute here is to choose a nonzero 
polynomial f in / of minimal degree. We claim that / is the principal ideal generated 
by f. It follows from the definition of an ideal that the principal ideal (f) is con- 
tained in /. To prove that J C (f), we use division with remainder to write 
g = fq + r, where r has lower degree than f, unless it is zero. Now if g is in the 
ideal /, then since f € / the definition of an ideal shows that r = g — fq is in] too. 
Since f has minimal degree among nonzero elements, the only possibility is that 
r = 0. Thus f divides g, as required. o 


The proof of the following corollary is similar to that of (2.6) in Chapter 2. 


(3.22) Corollary. Let F be a field, and let f,g be polynomials in F[x] which are 
not both zero. There is a unique monic polymomial d(x) called the greatest common 
divisor of f and g, with the following properties: 

(a) d generates the ideal (f, g) of F[x] generated by the two polynomials f, g. 

(b) d divides f and g. 

(c) If h is any divisor of f and g, then h divides d. 

(d) There are polynomials p,q © F[x] such that d = pf + qg.o 


4. QUOTIENT RINGS AND RELATIONS IN A RING 


Let J be an ideal of a ring R. The cosets of the additive subgroup /* of R* are the 
subsets 


Qo) eee 


It follows from what has been proved for groups that the set of cosets R/] = R is a 
group under addition. It is also a ring: 


(4.1) Theorem. Let / be an ideal of a ring R. 


(a) There is a unique ring structure on the set of cosets R = R/I such that the 
canonical map 7: R——>R sending a~~~a = a + I is a homomorphism. 


(b) The kernel of 7 is /. 


Proof. This proof has already been carried out in the special case that R is the 
ring of integers (Chapter 2, Section 9). We want to put a ring structure on R with 
the required properties, and if we forget about multiplication and consider only the 
addition law, the proof has already been given [Chapter 2 (10.5)]. What is left to do 
is to define multiplication. Let x,y € R, and say thatx = @=a+J/andy=b= 
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b + 1. We would like to define the product to be xy = ab = ab + J. In contrast 
with coset multiplication in a group [Chapter 2 (10.1) ], the set of products 


P slr: ieee +e eee 


is not always a coset of /. However. as in the case of the ring of integers, the set 7 is 
always contained in the single coset ab + /: If we writer =a +uands=h+ et 
with u,v € ZI, then 


(la + ab + chessab> (av + bute). 


and since / is an ideal, av + bu + uv € I. This is all that is needed to define the 
product coset: It is the coset which contains the set P. This coset is unique because 
the cosets partition R. The proof ot the remaining assertions closely follows the pat- 
tern of Chapter 2. Section 9. 


As in Chapter 6 (8.4) and Chapter 2 (10.9), one can show the following: 


(4.2) Proposition. Mapping property of quotient rings: Let f: R——>R' be a ring 
homomorphism with kernel / and let J be an ideal which is contained in /. Denote 
the residue ring R/J by R. 


(a) There is a unique homomorphism f: R—— R’ such that fa = f: 


pa. ee 


a 


RRS 
(b) First Isomorphism Theorem: If J = I, then f maps R isomorphically to the im- 
age of f. o 


We will now describe the fundamental relationship between ideals in a 
quotient ring R/J and ideals in the original ring R. 


(4.3) Proposition. Correspondence Theorem: Let R = R/J, and let 7 denote the 
canonical map R——> R. 


(a) There is a bijective correspondence between the set of ideals of R which con- 
tain J and the set of all ideals of R, given by 
[awe (I) and a '(1) ww I, 


(b) If 1 C R corresponds to / C R, then R/J and R/I are isomorphic rings. 


The second part of this proposition is often called the Third Isomorphism Theorem. 


[There is also a Second Isomorphism Theorem (see Chapter 6, miscellaneous exer- 
cise 7)]. 
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Proof. To prove (a), we must check the following points: 


(1) If 7 is an ideal of R which contains J, then 7 (J) is an ideal of R. 
(ii) If J is an ideal of R. then 7 ‘'(/) is an ideal of R. 
(iii) a (ar (DD) = Land w(@ (2) = 7. 


We know that the image of a subgroup is a subgroup [Chapter 2 (4.4)]. So to show 
that 7 (/) is an ideal of R, we need only prove that it is clostd under multiplication 
by elements of R. Let FE R, and let ¥ € w(/). We write F = w(r) for some 
r ER, and X¥ = w(x) for some x € J. Then rx = zw(rx) and rx EI. So 
Pm Tl): — that this proof works for all ideals J of R. We do not need the as- 
sumption that / D J at this point. However, the fact that 7 is surjective is essential. 

Next. we denote the homomorphism R—>R/I by ~, and we consider the 
composed homomorphism RR R/T. Since 7 and @ are surjective. so is 
g°7r. Moreover, the kernel of gov is the set of elements r € R such that 
mir) G1 = ker g. By definition, this is 7 '(/). Therefore 7 '(/), being the kernel 
of a homomorphism, is an ideal of R. This proves (ii). Also, the a Isomorphism 
Theorem applies to the homomorphism ¢ ° 7 and shows that R/7 '(/) is isomorphic 
to R/T. This proves part (b) of the proposition. 

It remains to prove (ii); remember that 7 ' isn’t usually a map. The inclusions 
m ‘(ar (1)) DI and w(m '(1)) CI are general properties of any map of sets and 
for arbitrary subsets. Moreover, the equality m(a '(/)) = I holds for any surjec- 
tive map of sets. We omit the verification of these facts. The final point, that 
aw '(z(1)) C I, is the one which requires that / D> J. Let x € w ‘(7 (/)). Then 
a(x) © a(J/), so there is an element y € / such that 7 (y) = a(x). Since 7 is a ho- 
momorphism, a(x — y) = 0 and x — y € J = ker a. Since y © J and / Cf, 
this implies that x € J, as required. 


The quotient construction has an important interpretation in terms of relations 
among elements in a ring R. Let us imagine performing a sequence of operations 


+.,—, X on some elements of R to get a new clement a. If the resulting element a is 
zero, we Say that the given elements are related by the equation 
(4.4) a=0. 


For instance, the elements 2, 3, 6 of the ring Z are related by the equation 
Dey oe — 

Now if the element a is not zero, we may ask whether it is possible to modify 
R in such a way that (4.4) becomes true. We can think of this process as adding a 
new relation, which will collapse the ring. For example, the relation 3x 4 — 5 = 0 
does not hold in Z, because 3 x 4 — 5 = 7. But we can impose the relation 7 = 0 
on the integers. Doing so amounts to working modulo 7. 

At this point we can forget about the procedure which led us to the particular 
element qa; let it be an arbitrary element of R. Now when we modify R to impose the 
relation a = 0, we want to keep the operations + and X, so we will have to accept 
some consequences of this relation. For example, ra = 0 and b + a = b are the 
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consequences of multiplying and adding given elements to both sides of a = 0. Per- 
forming these operations in succession gives us the consequence 


(4.5) b+ra=b. 


If we want to set a = 0, we must also set b + ra = b for all b,r © R. Theorem 
(4.1) tells us that this is enough: There are no other consequences of (4.4). To see 
this, note that if we fix an element b but let r vary, the set {b + ra} is the coset 
b + (a), where (a) = aR is the principal ideal generated by a. Setting b + ra = b 
for all r is the same as equating the elements of this coset. This is precisely what 
happens when we pass from R to the quotient ring R=R/ (a). The elements of R 
are the cosets b = b + (a), and the canonical map 77: R—— R carries all the ele- 
ments b + ra in one coset to the same element b = a(b). So exactly the right 
amount of collapsing has taken place in R. Also, @ = 0, because a is an element of 
the ideal (a), which is the kernel of 7. So it is reasonable to view R = R/(a) as the 
ring obtained by introducing the relation a = 0 into R. 

If our element a was obtained from some other elements by a sequence of ring 
operations, as we supposed in (4.4), then the fact that zr is a homomorphism implies 
that the same sequence of operations gives 0 in R. Thus if wv + w = a for some 
u,v,w € R, then the relation 


(4.6) uvo+w=0 


holds in R. For, since 7 is a homomorphism, 70 + W = wo Fw = a= 0. 

A good example of this construction is the relation n = 0 in the ring of in- 
tegers Z. The resulting ring is Z/nZ. 

More generally, we can introduce any number of relations a; = ... = a, = 0, 
by taking the ideal J generated by a, ... , @n (3.15), which is the set of linear combi- 
nations {r;a, + --- + rndn | ri © R}. The quotient ring R = ae should be viewed 
as the ring a by introducing the n relations a, = 0,. = 0 into R. Since 
a; € I, the residues @; are zero. Two elements b,b’ of R have ns same image in R if 
and only if b' — b E€ I, orb’ = b + ria, + «++ + rndn, for some r; € R. Thus the 
relations 


(4.7) b+riay te) + rnQn = b 


are the only consequences of a; = +++ = a, = 0. 

It follows from the Third Isomorphism Theorem (4.3b) that introducing rela- 
tions one at a time or all together leads to isomorphic results. To be precise, let a, b 
be elements of a ring R, and let R = R/(a) be the result of killing a. Introducing the 
relation b = 0 into the ring R leads to the quotient ring R/(b), and this ring is iso- 
morphic to the quotient R/(a, b) obtained by killing a and b at the same time, be- 
cause (a, b) and (b) are corresponding ideals [see (4.3)]. 

Note that the more relations we add, the more collapsing takes place in the 
map R——> R. If we add them carelessly, the worst that can happen is that we may 
end up with / = R and R = 0. All relations a = 0 become true when we collapse R 
to the zero ring. 
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The procedure of introducing relations will lead to a new ring in most cases. 
That is why it is so important. But in some simple cases the First Isomorphism The- 
orem can be used to relate the ring obtained to a more familiar one. We will work 
out two examples to illustrate this. 

Let R = Z[i] be the ring of Gauss integers, and let R be obtained by introduc- 
ing the relation 1 + 3i = 0. SoR = R/I where / is the principal ideal generated by 
| + 37. We begin by experimenting with the relation, looking for recognizable con- 
sequences. Multiplying —1 = 3 on both sides by —i, we obtain i = 3. So i = 3 in 
R. On the other hand, i? = —1 in R, and hence in R too. Therefore 3? = -1, or 
10 = 0, in R. Since i = 3 and 10 = 0 in R, it is reasonable to guess that R is iso- 
morphic to Z/(10) = Z/10Z. 


(4.8) Proposition. The ring Z[i]/(1 + 3%) is isomorphic to the ring Z/10Z of in- 
tegers modulo 10. 


Proof. Having made this guess, we can prove it, by analyzing the homomor- 
phism g: Z——>R (3.9). By the First Isomorphism Theorem, im g ~ Z/(ker ¢). 
So if we show that ¢ is surjective and that ker g = 10Z, we will have succeeded. 
Now every element of R is the residue of a Gauss integer a + bi. Since i = 3 in R, 
the residue of a + bi is the same as that of the integer a + 3b. This shows that ¢ is 
surjective. Next, let n be an element of ker g. Using the fact that R= R/I, we see 
that n must be in the ideal /, that is, that n is divisible by 1 + 3i in the ring of Gauss 
integers. So we may write n = (a + bi)(1 + 3i) = (a — 3b) + (a + b)i for some 
integers a,b. Since n is an integer, 3a +b=0, or b= -3a. Thus 
n= atl — 3i)(1 + 31) = 10a, and this shows that ker g C 10Z. On the other 
hand, we already saw that 10 € ker oy. So ker gy = 10Z, as required. o 


Another possible way to identify the quotient R/J is to find a ring R’ and a ho- 
momorphism gy: R—— R' whose kernel is /. To illustrate this, let R = C[x, y]/(xy). 
Here the fact that xy is a product can be used to find such a map ¢. 


(4.10) Proposition. The ring C[x,y]/(xy) is isomorphic to the subring of the 
product ring C[x] x C[y] consisting of the pairs (p(x), q(y)) such that p(O) = q(0). 


Proof. We can identify the ring C[x, y]/(y) easily, because the principal ideal 
(y) is the kernel of the substitution homomorphism ¢: C[x, y]——> C[x] sending 
yow~0. By the First Isomorphism Theorem, C[x, y]/(y) ~ C[x]. Similarly, 
C[x, y]/(x) ~ C[y]. So it is natural to look at the homomorphism to the product 
ring y: C[x, y])—— C[x] x C[y], which is defined by f(x, y) ~~» (f(x, 0), (0, y)). 
The kernel of ¢ is the intersection of the kernels: ker ¢ = (y) M (x). To be in this 
intersection, a polynomial must be divisible by both y and x. This just means that it 
is divisible by xy. So ker g = (xy). By the First [somorphism Theorem, 
R = C[x, y]/(xy) is isomorphic to the image of the homomorphism ¢. That image is 
the subring described in the statement of the proposition. o 
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Aside from the First Isomorphism Theorem, there are no general methods for 
identifying a aan ring, because it will usually not be a familiar ring. The ring 
Chr, y]/(v? — x4 + x). for example, is fundamentally different from any ring we 
have seen up to now. 


5. ADJUNCTION OF ELEMENTS 


In this section we discuss a procedure which is closely related to the introduction of 
relations, that of adding new elements to a ring. Our model for this procedure is the 
construction of the complex field, starting from the real numbers. One obtains C 
from R by adjoining i, and the construction is completely formal. That is, the imag- 
inary number i has no properties other than those forced by the relation 


oil ly eae 


We are now ready to understand the general principle behind this construction. Let 
us start with an arbitrary ring R, and consider the problem of building a bigger ring 
containing the elements of R and also containing a new element, which we denote 
by a. We will probably want a to satisfy some relations such as (5.1), for instance. 
A ring R' containing R as a subring is called a ring extension of R. So we are look- 
ing for a suitable extension. 

Sometimes the element a may be available in a ring extension R’ that we al- 
ready know. In that case, our solution is the subring of R’ generated by R and a. 
This subring is denoted by R[a]. We have already described this ring in Section 1, 
in the case R = Zand R’ = C. The description is no different in general: R[a@] con- 
sists of the elements of R’ which have polynomial expressions 


TGs oo ch Ie Ty 


with coefficients 7; in R. But as happens when wwe first construct C from R, we may 
not yet know an extension containing a. Then we must construct it abstractly. Actu- 
ally, we already did this when we constructed the polynomial ring R[x]. 

Note that the polynomial ring R{x] is an extension of R and that it is generated 
by R and x. So the notation R[x] agrees with the one introduced above. Moreover, 
the Substitution Principle (3.4) tells us that the polynomial ring is the universal 
solution to our problem of adjoining a new element, in the following sense: If @ is an 
element of any ring extension R' of R, then there is a unique map R[x]——>R’ 
which_is the identity on R and which carries x to a. The image of this map will be 
the subring R[a]. 

Let us now consider the question of the relations which we want our new ele- 
ment to satisfy. The variable x in the polynomial ring R[x] satisfies no relations ex- 
cept those, such as Ox = 0, implied by the ring axioms. This is another way to state 
the universal property of the polynomial ring. We may want some nontrivial rela- 
tions. But now that we have the ring R[x] in hand we can add relations to it as we 
like, using the procedure given in Section 4. We introduce relations by using the 
quotient construction on the polynomial ring R[x]. The fact that R gets replaced by 
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R[x] in the construction complicates things notationally, but aside from this nota- 
tional complication, nothing is different. 

For example, we can construct the complex numbers formally by introducing 
the relation x* + 1 = 0 into the ring of real polynomials R[x] = P. To do so, we 
form the quotient ring P = P/(x° + 1). The residue of x becomes our element i. 
Note that the relation x°+1 = ¥? + 1 = O holds in P, because the map 7: P——> P 
is a homomorphism and because x° + 1 € ker 7. And since | is the unit element in 
P, our standard notation for the unit element drops the bar. So P is obtained from R 
by adjoining an element X satisfying ¥*> + | = 0. In other words, P ~ C as re- 
quired. 

The fact that the quotient R[.-x]/(x* + 1) is isomorphic to C also follows trom 
the First Isomorphism Theorem (4.2b): Substitution (3.4) of ¢ for x defines a surjec- 
tive homomorphism ¢: R{x]—— C, whose kernel is the set of real polynomials 
with i as a root. Now if / is a root of a real polynomial p(x), then —/ is also a root. 
Therefore x — i and x + i both divide p(x). The kernel is the set of real polynomi- 
als divisible by (x — i)(x + i) = x? + 1, which is the principal ideal (x? + 1). By 
the First Isomorphism Theorem, C is isomorphic to R[.x]/(x? + 1). 

Another simple example of adjunction of an element was used in Section 6 of 
Chapter 8, where a formal infinitesimal element satisfying 


(oe) e= 


was introduced to compute tangent vectors. An element of a ring R is called 
infinitesimal or nilpotent if some power is zero, and our procedure allows us to ad- 
join infinitesimals to a ring. Thus the result of adjoining an element € satisfying 
(5.2) to a ring R is the quotient ring R’ = R[x]/(x’). The residue of x is the 
infinitesimal element e€. In this ring, the relation €? = 0 reduces all polynomial ex- 
pressions in € to degree <2, so the elements of R’ have the form a + be, with 
a,b © R. But the multiplication rule [Chapter 8 (6.5)] is different from the rule for 
multiplying complex numbers. 

In general, if we want to adjoin an element a satisfying one or more polyno- 
mial relations of the form 


(5.3) f(a) = cna” + + + cra + co = 0 

to a ring R, the solution is R' = R[x]//, where / is the ideal in R[x] generated by 
the polynomials f(x). If a denotes the residue x of x in R’, then 

(5.4) O = FQ) = GX" + +H = Gra" + + +O. 


Here ¢; is the image in R’ of the constant polynomial c;. So a satisfies the relation in 
R’ which corresponds to the relation (5.3) in R. The ring obtained in this way will 
often be denoted by 


(5.5) R[{a] = ring obtained by adjoining a to R. 


Several elements a,..., @m can be adjoined by repeating this procedure, or by intro- 
ducing the appropriate relations in the polynomial ring R [x1,...,Xm] in m variables 
all at once. 
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One of the most important cases is that the new element a is required to satisfy 
a single monic equation of degree n > 0. Suppose we want the relation f(x) = 0. 
where f is the monic polynomial 


(5.6) fea) =e ce cere) eC 


It isn’t difficult to describe the ring R[a] precisely in this special case. 


(5.7) Proposition. Let R be a ring, and let f(x) be a monic polynomial of positive 
degree n, with coefficients in R. Let R[a] denote the ring obtained by adjoining an 
element satisfying the relation f(a) = 0. The elements of R[a] are in bijective cor- 
respondence with vectors (ro,..., rn—-1) © R". Such a vector corresponds to the lin- 
ear combination 


rot nat ratte: +t rz-;a"', with r; € R. 


This proposition says that the powers 1,a,q@°,....@” ' form a basis for R[a} 
over R. To multiply two such linear combinations in R[@], we use polynomial multi- 
plication and then divide the product by f. The remainder is the linear combination 
of |,@,...,a@”" ' which represents the product. So although addition in R’ depends 
only on the degree, multiplication depends strongly on the particular polynomial f. 

For example, let R’ be the result of adjoining an element a to Z satisfying the 
relation a°+3a+1 = 0. So R' = Z[x]/(x* + 3x + 1). The elements of R’ are 
linear combinations ro+r;at+r2a’, where r, are integers. Addition of two linear 
combinations is polynomial addition: (2+a—a’) + (1+a) = 3+2a—a@’, for in- 
stance. To multiply, we compute the product using polynomial multiplication: 
(2+a—a’)(l+a) = 2+3a—a°’. Then we divide by 1+3at+a’?: 2+3a-a? = 
(1+3a+a*)(-1) + (3+6a@). Since 1+3a+a’* = 0 in R’, the remainder 3 + 6a is 
the linear combination which represents the product. 

Or let R’ be obtained by adjoining an element a to Fs with the relation 
a’ — 3 = 0, that is, R' = Fs[x]/(x? — 3). Here @ represents a formal square root 
of 3. The elements of R’ are the 25 linear expressions a + ba in a with coefficients 
a,b © fs. This ring is a field. To prove this, we verify that every nonzero element 
a+ ba of R' is invertible. Note that (a + ba)(a — ba) = a* — 3b? € Fs. More- 
over, the equation x* = 3 has no solution in F;, and this implies that a? — 3b? # 0. 
Therefore a’ — 3b? is invertible in F; and in R’. This shows that a + ba is invert- 
ible too. Its inverse is (a* — 3b*)~'(a — ba). 

On the other hand, the same procedure applied to F,, does not yield a field. 
The reason is that x7 — 3 = (x + 5){x — 5) in Fy,[x]. So if @ denotes the residue of 
xin R' = F,,[x]/(x? — 3), then (@ + 5)(a — 5) = 0. This can be explained intu- 
itively by noting that we constructed R’ by adjoining a square root of 3 to F,;,; when 
that field already contains the two square roots +5. At first glance, one might expect 
to get F,; back by this procedure. But we haven’t told a@ whether to be equal to 5 or 
to —5. We’ve only told it that its square is 3. The relation (a + 5)(a — 5) = 0 
reflects this ambiguity. o 
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Proof of Proposition (5.7). Since R[@] is a quotient of the polynomial ring 
R[x], every element in R[a] is the residue of a polynomial. This means that it can 
be written in the form g(a) for some polynomial g(x) € R[x]. The relation 
f(a) = 0 can be used to replace any polynomial g(a) of degree =n by one of lower 
degree: We perform division with remainder by f(x) on the polynomial g(x), obtain- 
ing an expression of the form g(x) = f(x)q(x) + r(x) (3.19). Since f(a) = 0, 
g(a) = r(a). Thus every element B of R[a] can be written as a polynomial in a, of 
degree <n. 

We now show that the principal ideal generated by f(x) contains no element of 
degree <n, and therefore that g(a) # 0 for every nonzero polynomial g(x) of de- 
gree <n. This will imply that the expression of degree <n for an element B is 
unique. The principal ideal generated by f(x) is the set of all multiples hf of f. Sup- 
pose h(x) = bax” + +++ + bo, with by #0. Then the highest-degree term of 
h(x) f(x) is bnx*", and hence hf has degree m + n = n. This completes the proof 
of the proposition. o 


It is harder to analyze the structure of the ring obtained by adjoining an ele- 
ment which satisfies a nonmonic polynomial relation. One of the simplest and most 
important cases is obtained by adjoining a multiplicative inverse of an element to a 
ring. If an element a € R has an inverse a, then a satisfies the relation 


(5.8) aa -1=0. 


So we can adjoin an inverse by forming the quotient ring R’ = R{x]/(ax — 1). The 
residue of x becomes the inverse @ of a. This ring has no basis of the type described 
in Proposition (5.7), but we can compute in it fairly easily because every element of 
R’ has the form a*r, where r © R and k is a nonnegative integer: Say that 
B=rot+ na +--+ ra-1a"', with r; € R. Then since aa = 1, we can also 
write B = a” "(roa”” | + ria"? + ++ + rns). 

One interesting example is that R is a polynomial ring itself, say R = F[r], 
and that we adjoin an inverse to the variable t. Then R’ = F[t, x]/(xt — 1). This 
ring identifies naturally with the ring F[t,t~'] of Laurent polynomials in t. A Lau- 
rent polynomial is a polynomial in ¢ and t~' of the form 


(5.9) f(t) = Sait? = ant" + + ait”! + ag + ait + + ant”. 


We leave the construction of this isomorphism as an exercise. 

We must now consider a point which we have suppressed in our discussion of 
adjunction of elements: When we adjoin an element a to a ring R and impose some 
relations, will our original R be a subring of the ring R[a] which we obtain? We 
know that R is contained in the polynomial ring R[x], as the subring of constant 
polynomials. So the restriction of the canonical map 7: R[x]——> R[x]/I = R{a] 
to constant polynomials gives us a homomorphism y: R—> R[a], which is the map 
r~w~F considered above. The kernel of the map w: R—>R[a] = R[x]// is easy 
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to determine in principle. It is the set of constant polynomials in the ideal /: 
(5.10) ker p=ROI. 


It follows from Proposition (5.7) that w is injective, and hence that ker yw = 0, when 
a is required to satisfy one monic equation. But y is not always injective. 

For example, we had better not adjoin an inverse of 0 to a ring. From the equa- 
tion Oa = | we can conclude that 0 = |. The zero element is invertible only in the 
zero ring, so if we insist on adjoining an inverse of 0, we must end up with the zero 
ring. 

More generally, let a, b be two elements of a ring R whose product ab is zero. 
Then a is not invertible unless b=0. For, if a’ exists in R, then 
b = a 'ab = a''0 = 0. It follows that if a product ab of two elements of a ring R 
is zero, then the procedure of adjoining an inverse of a to R must kill b. This can 
also be seen directly: The ideal of R[x] generated by ax — 1 contains 
~b(ax — 1) = b, which shows that the residue of b in the ring R[x]/(ax — 1) is 
Ze10. 

For example, 2-3 = 0 in the ring Z/(6). If we adjoin 3 ~' to this ring, we must 
kill 2. Killing 2 collapses Z/(6) to Z/(2) = F2. Since 3 = 1 is invertible in F2, no 
further action is necessary, and R’ = (Z/(6))[x]/(3x — 1) ~ Fo. Again, this can 
be checked directly. To do so, we note that the ring R’ is isomorphic to 
Z{x]/(6,3x — 1), and we analyze the two relations 6 = 0 and 3x — 1 = 0. They 
imply 6x = 0 and 6x — 2 = 0; hence 2 = O. Then 2x = 0 too, and combined with 
3x — 1 = 0, this implies x ~ 1 = 0. Hence the. ideal (6,3x — 1) of Z[x] con- 
tains the elements (2,x — 1). On the other hand, 6 and 3x — 1 are in the 
ideal (2,x — 1). So the two ideals are equal, and R’ is isomorphic to 
2[x\/(2,x — 1) = Fa. 

An element a of a ring is called a zero divisor if there is a nonzero element b 
such that ab = 0. For example, the residue of 3 is a zero divisor in the ring Z/(6). 
The term “zero divisor” is traditional, but it has been poorly chosen, because actu- 
ally every a € R divides zero: 0 = a0. 


6. INTEGRAL DOMAINS AND FRACTION FIELDS 


The difference between rings and fields is that nonzero elements of a ring R do not 
necessarily have inverses. In this section we discuss the problem of embedding a 
given ring R as a subring into a field. We saw in the last section that we can not ad- 
join the inverse of a zero divisor without killing some elements. So a ring which 
contains zero divisors can not be embedded into a field. 


(6.1) Definition. An integral domain R is a nonzero ring having no zero divisors. 
In other words, it has the property that if ab = 0, then a = 0 or b = O, and also 
1# O0inR. 


For example, any subring of a field is an integral domain. 
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An integral domain satisfies the cancellation law: 
(6.2) If ab = ac and a # 0, then b = c. 


For, from ab = ac we can deduce a(b — c) = 0. Then since a # 0, it follows that 
b-c=0.0 


(6.3) Proposition. Let R be an integral domain. Then the polynomial ring R[x] is 
an integral domain. 


(6.4) Proposition. An integral domain with finitely many elements is a field. 


We leave the proofs of these propositions as exercises. o © 


(6.5) Theorem. Let R be an integral domain. There exists an embedding of R into 
a field, meaning an injective homomorphism R—— F, where F is a field. 


We could construct the field by adjoining inverses of all nonzero elements of R, us- 
ing the procedure described in the last section. But in this case it is somewhat sim- 
pler to construct F with fractions. Our model is the construction of the rational num- 
bers @ as fractions of integers, and once the idea of using fractions is put forward, 
the construction follows the construction of the rational numbers very closely. 

Let R be an integral domain. A fraction will be a symbol a/b where a,b © R 
and b # 0. Two fractions a;/b;,a2/b2 are called equivalent, a,/b, ~ az/b», if 


ayb, = arb;. 


Let us check transitivity of this relation—the reflexive and symmetric properties are 
clear (see Chapter 2, Section 5). Suppose that a,/b; ~ a:/b2. and also that 
a2/b. ~ a3/b3. Then a,b, = ab, and ayb3 = a3b2. Multiply by b3 and b, to obtain 


a, bob; = anb\b; — a3b2b;. 


Cancel b, to get asb; = aib;. Thus a,/b, ~ a3/bs. 

The field of fractions F of R is the set of equivalence classes of fractions. As we 
do with rational numbers, we will speak of fractions a,/b;, @2/b2 as equal elements 
of F if they are equivalent fractions: a, /b, = a2/b2 in F means a,b, = azb,. Addi- 
tion and multiplication of fractions is defined as in arithmetic: 


Oc 8, aff ae -. sath 


Here it must be verified that these rules lead to equivalent answers if a/b and c/d are 
replaced by equivalent fractions. Then the axioms for a field must be verified. All of 


these verifications are straightforward exercises. o 


Notice that R is contained in F, provided that we identify a € R with the frac- 
tion a/1 because a/1 ~ b/1 only if a = b. The map a~m~a/1 is the injective ho- 
momorphism referred to in the theorem. 
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As an example, consider the polynomial ring K[x], where K is any field. This 
is an integral domain, and its fraction field is called the field of rational functions in 
x, with coefficients in K. This field is usually denoted by 


6.6 K(x) = equivalence classes of fractions f/g, where f, g 
(6.6) (x) =. are polynomials and g is not the zero polynomial | ° 


If K = R, then evaluation of a rational function f(x)/g(x) defines an actual 
function on the real line, wherever g(x) # 0. But as with polynomials, we should 
distinguish between the formally defined rational functions, which are fractions of 
polynomials, and the actual functions which they define by evaluation. 

The fraction field is a universal solution to the problem of embedding an inte- 
gral domain into a field. This is shown by the following proposition: 


(6.7) Proposition. Let R be an integral domain, with field of fractions F, and let 
gy: R—— K be any injective homomorphism of R to a field K. Then the rule 


(a/b) = gla)e (bd) 
defines the unique extension of g to a homomorphism ®: F ——> K. 


Proof. We must check that this extension is well defined. First, since the de- 
nominator of a fraction is not allowed to be zero and since @g is injective, p(b) # 0 
for any fraction a/b. Therefore ¢(b) is invertible in K, and g(a)p(b)~' is an ele- 
ment of K. Next, we check that equivalent fractions have the same image: If 
a/b. ~a,/b,, then ab, =aib2; hence g(a)e(b,) = g{ai)e(b2), and 
P(a2/b2) = ¢y(ar)p(b2) | = glai)p(b:)~' = Pla, /b,), as required. The facts that 
® is a homomorphism and that it is the unique extension of ¢ follow easily. o 


7. MAXIMAL IDEALS 


In this section we investigate surjective homomorphisms 
(7.1) g: Kk > F 


from a ring R to a field F. Given such a homomorphism, the First Isomorphism The- 
orem tells us that F is isomorphic to R/ker gy. Therefore we can recover F and ¢, up 
to isomorphism, from the kernel. To classify such homomorphisms, we must deter- 
mine the ideals M such that R/M is a field. 

By the Correspondence Theorem (4.3), the ideals of R = R/M correspond to 
ideals of R which contain M. Also, fields are characterized by the property of having 
exactly two ideals (3.16). So if R is a field, there are exactly two ideals containing 
M, namely M and R. Such an ideal is called maximal. 


(7.2) Definition. An ideal M is maximal if M # R but M is not contained in any 
ideals other than M and R. 
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(7.3) Corollary. 


(a) An ideal M of a ring R is maximal if and only if R = R/M is a field. 
(b) The zero ideal of R is maximal if and only if R is a field. o 


The next proposition follows from the fact that all ideals of Z are principal: 


(7.4) Proposition. The maximal ideals of the ring Z of integers are the principal 
ideals generated by prime integers. o 


The maximal ideals of the ring C[x] of complex polynomials in one variable 
can also be described very simply: 


(7.5) Proposition. The maximal ideals of the polynomial ring C[x] are the princi- 
pal ideals generated by the linear polynomials x — a. The ideal Ma generated by 
x — ais the kernel of the substitution homomorphism sa: C[x]——> C which sends 
f(x)~~~» f(a). Thus there is a bijective correspondence between maximal ideals Ma 
and complex numbers a. 


Proof. We first show that every maximal ideal is generated by a linear polyno- 
mial x — a. Let M be maximal. By Proposition (3.21), M is a principal ideal, gener- 
ated by the monic polynomial f € M of least degree. Since every complex polyno- 
mial of positive degree has a root, f is divisible by some linear polynomial x — a. 
Then f is in the principal ideal (x — a), and hence M C (x — a). Since M is maxi- 
mal, M = (x — a). 

Next, we show that the kernel of the substitution homomorphism sq is gener- 
ated by x — a: To say that a polynomial g is in the kernel of s, means that a is a root 
of g, or that x — a divides g. Thus x — a generates ker sa. Since the image of Sa is 
a field, this also shows that (x — a) is a maximal ideal. 5 


The extension of Proposition (7.5) to several variables is one of the most im- 
portant theorems about polynomial rings. 


(7.6) Theorem. Hilbert's Nullstellensatz: The maximal ideals of the polynomial 
mig C[xy,...; Xn] are in bijective correspondence with points of complex n- 
dimensional space. A point a = (a),...,@n) in C” corresponds to the kernel of the 
substitution map sa: C[x...., Xn]—— C, which sends f(x) ~~~» f(a). The kernel Ma 
of this map is the ideal generated by the linear polynomials 
Mita Aisecsgetn 2 Gn: 

Proof. Leta € C", and let M, be the kernel of the substitution map s.. Since 
Su is surjective and C is a field, Ma is a maximal ideal. Next, let us verify that Mz is 
generated by the linear polynomials, as asserted. To do so, we expand f(x) in pow- 
ers ol %; — i,..4,%, — an, Writing 


fx) = f@ + D eda: — ai) + D eye — aig — a) + +. 
i fig 


372 Rings Chapter 10 


You may recognize this as Taylor’s expansion: c; = df/dx;, and so on. The exis- 
tence of such an expansion can be derived algebraically by substituting x = a + u 
into f, expanding in powers of the variables u, and then substituting u = x — a back 
into the result. Note that every term on the right side except f(a) is divisible by at 
least one of the polynomials (x; — aj). So if f is in the kernel of sa, that is, if 
f(a) = 0, then f(x) is in the ideal which these elements generate. This shows that 
the polynomials x; — a; generate M,. 

It is harder to prove that every maximal ideal is of the form M, for some point 
a €C". To do so, let M be any maximal ideal, and let K denote the field 


C[.x1,...,%n]/M. We consider the restriction of the canonical map (4.1) 
ar: Clx,,...,Xn]—— K to the subring C[x,] of polynomials in one variable: 
a: C[x,]—> K. 


(7.7) Lemma. The kernel of 77, is either zero or else it is a maximal ideal. 


Proof. Assume that the kernel is not zero, and let f be a nonzero element in 
ker 7. Since K is not the zero ring, ker 7 is not the whole ring. So f is not con- 
stant, which implies that it is divisible by a linear polynomial, say f = (x; — ai)g. 
Then (x; — a,)7:(g) = m7(f) = 0 in K. Since K is a field, 7(x; — a) = O or 
(zg) = 0. So one of the two elements x; — a; or g is in ker 77. By induction on 
the degree of f, ker 7, contains a linear polynomial. Hence it is a maximal ideal 
(73) 


We are going to show that ker 77, is not the zero ideal. It will follow that M 
contains a linear polynomial of the form x, — a;. Since the index | can be replaced 
by any other index, M contains polynomials of the form x, — a, for every 
v = 1,...,n. This will show that M is contained in, and hence equal to, the kernel 
of a substitution map f(x) ~~~ f(a), as claimed. 

So, suppose ker 77; = (0). Then a, maps C[x,] isomorphically to its image, 
which is a subring of K. According to Proposition (6.7), this map can be extended to 
the field of fractions of C[x]. Hence K contains a field isomorphic to the field of ra- 
tional functions C(x) [see (3.17)]. 

Now the monomials x‘! = -x;'!x2'2 +++ x,'" form a basis of C[x,,..., xn] as a vec- 
tor space over C (see Section 2). Thus C[x,,...,.x,] has a countable basis (Appendix, 
Section 1). Since K is a quotient of CLx;,...,x,], there is a countable family which 
spans K as vector space over C, namely the residues of the monomials span this 
field. We will show that there are uncountably many linearly independent elements 
in C(x). It will follow [Lemma (7.9)] that C(x) can not be isomorphic to a subspace 
of K. This contradiction will show ker 7, # (0). 

The fact we need is that the elements of the complex field C do not form a 
countable set [Appendix (1.7)]. Using this fact, the following two lemmas will finish 
the proof: 


(7.8) Lemma. The uncountably many rational functions (x — a)', a € C, are 
linearly independent. 
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Proof. A rational function f/g defines an actual function by evaluation, at all 
points of the complex plane at which g # 0. The rational function (x — a) ' has a 
pole at a, which means that it takes on arbitrarily large values near a@. It is bounded 
near any other point. Consider a linear combination 


> Ci 
t 


x — Qj’ 


where a@,..., @, are distinct complex numbers and where some coefficient, say C), iS 
not zero. The first term of this sum is unbounded near a, but the others are 
bounded there. It follows that the linear combination does not define the zero func- 
tion; hence it is not zero. o0 


(7.9) Lemma. Let V be a vector space which is spanned by a countable family 
{v1,v2,...} of vectors. Then every set L of linearly independent vectors in V is finite 
or countably infinite. 


Proof. Let L be a linearly independent subset of V, let V, be the span of the 
first n vectors v1,...,Un and let L, = L M V,. Then L, is a linearly independent set 
in a finite-dimensional space V,, hence it is a finite set [Chapter 3 (3.16)]. More- 
over, L is the union of all the L,,’s. The union of countably many finite sets is finite 
or countably infinite. o 


8. ALGEBRAIC GEOMETRY 


To me algebraic geometry is algebra with a kick. 


Soloraon Lefschetz 


Let V be a subset of complex n-space C”. If V can be defined as the set of common 
zeros of a finite number of polynomials in n variables, then it is called an algebraic 
variety, or just a variety for short. (I don’t know the origin of this unattractive 
term.) For instance, a complex line in C? is, by definition, the set of solutions of a 
linear equation ax + by + c = 0. This is a variety. So is a point. The point (a, ) is 
the set of common zeros of the two polynomials x — a and y — b. We have seen a 
number of other interesting varieties already. The group SL2(C), for example, being 
the locus of solutions of the polynomial equation x1;x22 — X12X2 — 1 = 0, is a vari- 
ety in C*. 

Hilbert’s Nullstellensatz provides us with an important link between algebra 
and geometry. It tells us that the maximal ideals in the polynomial ring 
C[x] = C[x,...,xn] correspond to points in C”. This correspondence can also be 
used to relate algebraic varieties to quotient rings of the polynomial ring. 


(8.1) Theorem. Let f\,..., f; be polynomials in C[x:,..., xn], and let V be the va- 
riety defined by the system of equations f\(x) = 0,..., f-(x) = 0. Let J be the ideal 
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(fi,..., f) generated by the given polynomials. The maximal ideals of the quotient 
ring R = C[x]// are in bijective correspondence with points of V. 


Proof. The maximal ideals of R correspond to those maximal ideals of Cx] 
which contain / [Correspondence Theorem (4.3)]. And an ideal will contain / if and 
only if it contains the generators f,,..., f, of 7. On the other hand, the maximal ideal 
M. which corresponds to a point a € C” is the kernel of the substitution map 
f(x)~~~ f(a). So fi € Ma if and only if fi(a) = 0, which means that a € V.o 


This theorem shows that the algebraic properties of the ring R are closely con- 
nected with the geometry of V. In principle, all properties of the system of polynd- 
mial equations 


(8.2) fil) =... = fx) = 0 


are reflected in the structure of the ring R = C[x]/(f,,..., f-). The theory of this re- 
lationship is the field of mathematics called algebraic geometry. We won’t take the 
time to go very far into it here. The important thing for us to learn is that geometric 
properties of the variety provide information about the ring, and conversely. 

The simplest question about a set is whether or not it is empty. So we might 
ask whether it is possible for a ring to have no maximal ideals at all. It turns out that 
this happens only for the zero ring: 


(8.3) Theorem. Let R be a ring. Every ideal J of R which is not the unit ideal is 
contained in a maximal ideal. 


(8.4) Corollary. The only ring R having no maximal ideals is the zero ring. o 


Theorem (8.3) can be proved using the Axiom of Choice, or Zorn’s Lemma. 
However, for quotients of polynomial rings it is a consequence of the Hilbert Basis 
Theorem, which we will prove later [Chapter 12 (5.18)]. Rather than enter into a 
discussion of the Axiom of Choice, we will defer further discussion of the proof to 
Chapter 12. 

If we put Theorems (8.1) and (8.3) together, we obtain another important cor- 
ollary: 


(8.5) Corollary. Let f,,..., f- be polynomials in C[x,,...,x,]. If the system of 
equations f, = ... = f, = 0 has no solution in C”, then 1 is a linear combination 


b= > Ff 


of the f;, with polynomial coefficients. 


For, if the system has no solution, then Theorem (8.1) tells us that there is no maxi- 
mal ideal containing the ideal J = (fi,..., f-). By Theorem (8.3), J is the unit 
ideal. o 
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Most choices of three polynomials f,, f2, fs in two variables x, y have no com- 
mon solutions. If follows that we can usually express 1 as a linear combination 
1 = pifi + pof2 + pfs, where p; are polynomials. This is not obvious. For in- 
stance, the ideal generated by 


(8.6) je x? +y —-1, f=x?—-y+1, f=x-1 


is the unit ideal. This can be proved by showing that the set of equations 
fi = f2 = fs = O has no solution in C’. If we didn’t have the Nullstellensatz, it 
might take us some time to discover that we could write | as a linear combination, 
with polynomial coefficients, of these three polynomials. 

The Nullstellensatz has been reformulated in many ways, and actually the one 
we gave in the last section is not its original form. Here is the original: 


(8.7) Theorem. Classical form of the Nullstellensatz: Let fi,..., f; and g be poly- 
nomials in C[x;,..., xn]. Let V be the variety of zeros of f,,..., f;, and let J be the 
ideal generated by these polynomials. If g = 0 identically on V, then some power of 
g is in the ideal /. 


Proof. To prove this we study the ring obtained by inverting the polynomial 
g, by means of the equation gy = 1. Assume that g vanishes identically on V. Con- 
sider the r + 1 polynomials fi(x),..., f-(x), g(x)y — 1 inthe variables x,,..., xn, y. 
The last is the only polynomial which involves the variable y. Notice that these poly- 
nomials have no common zero in C”*'. For, if fi,...,f; vanish at a point 
(a1,...,an,b) © C”*', then by hypothesis g vanishes too, and hence gy — 1 takes 
the value -—1. Corollary (8.5) applies and tells us that the polynomials 
fi,---, fr, gy — 1 generate the unit ideal in C[x, y]. So we may write 


1 = D pilx,y) fix, y) + a(x, y(e(ay — 0). 
We substitute y = 1/g into this equation, obtaining 
1 = > pilx, g') fil). 


We now clear denominators in p;(x, g"'), multiplying both sides of the equation by a 
sufficiently large power of g. This yields the required polynomial expression 


g(x)% = 2 hi(x) fi(x), 


where hj(x) = 9(x)%pi(x, g"'). 0 


It is not easy to get a good feeling for a general algebraic variety in C”, but the 
general shape of a variety in C” can be described fairly simply. 


(8.8) Proposition. Two nonzero polynomials f(x, y), g (x, y) in two variables have 
only finitely many common zeros, unless they have a nonconstant polynomial factor 


in common. 
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If the degrees of f and g are m and n respectively, the number of common ze- 
ros is bounded by mn. This is known as the Bezout bound. For instance, two conics 
intersect in at most four points. It is somewhat harder to prove the Bezout bound 
than just the finiteness, and we won’t give a proof. 


Proof of Proposition (8.8). We assume that f and g have no common noncon- 
stant factor. Let F denote the field of rational functions in x, the field of fractions of 
the ring C[x]. It is useful to regard f and g as elements of the polynomial ring F[y] 
in one variable, because we can use the fact that every ideal of F[y] is principal. Let 
I denote the ideal generated by f, g in F[y]. This is a principal ideal, generated by 
the greatest common divisor h of f and g in F[y] (3.22). If f and g have no common 
nonconstant factor in F[y], then J is the unit ideal. 

Our assumption is that f and g have no common factor in C[x, y], not that they 
have no common factor in F[y], so we need to relate these two properties. Factoring 
polynomials is one of the topics of the next chapter, so we state the bai which we 
need here and defer the proof (see Chapter 11 (3.9)). :: 


(8.9) Lemma. Let f, 2 © C[x, y], and let F be the field of rational functions in x. 
If f and g have a common factor in F[y] which is not an element of F, then they have 
a common nonconstant factor in C[x, y]. 


‘We return to the proof of the proposition. Since our two polynomials f, g have no 
common factor in C[x, y], they are relatively prime in F [y], so the ideal / they gen- 
erate in F[y] is the unit ideal. We may therefore write | = rf + sg, where r;s are 
elements of F[y]. Then r,s have denominators which are polynomials in x alone, 
and we may clear these denominators, multiplying both sides of the equation by a 
suitable polynomial p(x). This results in an equation of the form 


p(x) = u(x, y) f(x,y) + v(x, y)g(x, y), 


where u,v € C[x, y]. It follows from this equation that a common zero of f and g 
must also be a zero of p. But p is a polynomial in x alone, and a polynomial in one 
variable has only finitely many roots. So the variable x takes on only finitely many 
values at the common zeros of f, g. The same thing is true of the variable y. It fol- 
lows that the common zeros form a finite set. o 


This proposition shows that the most interesting varieties in C’ are those which 
are defined as the zeros of a single polynomial f(x, y). These loci are called alge- 
braic curves, or Riemann surfaces, and their geometry can be quite subtle. A 
Riemann surface is two-dimensional, so calling it an algebraic curve would seem to 
be a misnomer. This use of the term curve refers to the fact that such a locus can be 
described analytically by one complex parameter, near a point. 

A rough description of such a variety, when f is irreducible, follows. (A poly- 
nomial is called irreducible if it is not the product of two: nonconstant polynomials. ) 
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We regard f(x,y) as a polynomial in y whose coefficients are polynomials in x, say 
(8.10) f(x,y) = un(x)y" + + + u(x)y + ols), 
with ui(x) € C[x]. 


(8.11) Proposition. Let f(x,y) be an irreducible polynomial in C[.x, y] which is 
not a polynomial in x alone, and let S be the locus of zeros of f in C?. Let n denote 
the degree of f, as a polynomial in y. 


(a) For every value a of the variable x, there are at most n points of S whose x- 
coordinate is a. 

(b) There is a finite set A of values of x such that if a € A then there are exactly n 
points of S whose x-coordinate is a. 


Proof. Leta € C, and consider the polynomial f(a, y). The points (a,b) € S 
are those such that 6 is a root of f(a,v). This polynomial is not identically zero, be- 
cause if it were, then x — a would divide each of the coefficients u;(x), and hence it 
would divide f. But f is assumed to be irreducible. Next, the degree of f(a, y) in y is 
at most n, and so it has at most n roots. It will have fewer than n roots if either 


(8.12) 


(i) The degree of f(a, y) is less than n, or 
(ii) the degree of f(a, y) is n, but this polynomial has a multiple root. 


Case (i) occurs when the leading coefficient u,(x) vanishes at a, that is, when a 
is a root of un(x). Since un is a polynomial in x, there are finitely many such values. 

Now a complex number b is a multiple root of a polynomial h(y) [meaning that 
(vy — b) divides h(y)] if and only if it is a root of h(y) and of its derivative h'(y). 
The proof of this fact is left as an exercise. In our situation, h(y) = f(a, y). The first 
variable is fixed, so the derivative is the partial derivative with respect to y. Thus 
case (ii) occurs at points (a,b) which are common zeros of f and df/dy. Note that f 
does not divide the partial derivative of/dy, because the degree of the partial 
derivative in y isn — 1, which is less than the degree of fin y. Since fis assumed to 
be irreducible, f and df/dy have no nonconstant factor in common. Proposition (8.8) 
tells us that there are finitely many common zeros. o 


Proposition (8.11) can be summed up by saying that S is an n-sheeted covering 
of the complex x-plane P. Since there is a finite set A above which S has fewer than 
n sheets, it is called a branched covering. For example, consider the locus 
x? + xy? — | = 0. This equation has two solutions y for every value of x except 
x = 0, +1. There is no solution with x = 0, and there is only one with x = 1 or 
~1. So this locus is a branched double covering of P. 

Here is the precise definition of a branched covering: 
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(8.13) Definition. An n-sheeted branched covering of the complex plane P is a to- 
pological space S together with a continuous map 7: S—— P, such that 


(a) 7 is n-to-one on the complement of a finite set A in P. 


(b) For every point x» € P — A, there is an open neighborhood U of x, so that 
a '(U) is made up of n disconnected parts (7 '(U) = Vi U +++ U V,), each Vi 
is open in S, and w+ maps V; homeomorphically to U. 


(8.14) Figure. Part of an n-sheeted covering. 


(8.15) Corollary. Let f(x,y) be an irreducible polynomial in C[x, y] which has 
degree n > 0 in the variable y. The Riemann surface of f(x,y) is an n-sheeted 
branched covering of the plane. o 


Proof. The fact that the Riemann surface S of f has the first property of a 
branched covering is Proposition (8.11). So it remains to verify property (8.13b). 
Consider a point xo at which f(xo, y) has n roots y;,..., yn. Then (0f/dy)(x0, yi) # 0 
because y, is not a multiple root of f(x0,y:). The Implicit Function Theorem 
{Appendix (4.1)] applies and tells us that equation (8.2) can be solved for y = a(x) 
as a continuous function of x in some neighborhood U of xo, in such a way that 
yi = a(x). Similarly, we can solve for y = aj(x) such that y; = aj(xo). Cutting 
down the size of U, we may assume that each a;(x) is defined on U. Since y,,..., yn 
are all distinct and the a;(x) are continuous functions, they have no common values 
provided U is made sufficiently small. 

Consider the graphs of the n continuous functions a;: 


(8.16) Vi = {(x, ai(x)) | x © U}. 


They are disjoint because the a(x) have no common values on U. The map 
Vi— U is a homeomorphism because it has the continuous inverse function 
Uw V;. The inverse sends x» (x, a;(x)). And 


w'U)=ViU-- UV, 


because S$ has at most n points above any x, and the n points have been exhibited as 
(x, ai(x)) € V;. Each of the sets V; is closed in U X C, because it is the set of zeros 
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of the continuous function y—a,(x). Then V; is also closed in the subset 7 '(U) of 
U X C. It follows that V, is open in a '(U), because it is the complement of the 
closed set V:U...UV,. Since U is open in C, its inverse image 7 '(U) is open in S. 
Thus V, ts open in an open subset of S, which shows that V, is open in S too. Simi- 
larly, V; is open for each i. o 


We will look at these loci again in Chapter 13. 


In helping geometry, modern algebra is helping itself above all. 


Oscar Zariski 


EXERCISES 


I. Definition of a Ring 


10. 


. Prove the following identities in an arbitrary ring R. 


(a) 02 =0 (b) -a=(-l)a (c) (-a)b = -(ab) 


. Describe explicitly the smallest subring of the complex numbers which contains the real 


cube root of 2. 


. Let a = 3i. Prove that the elements of Z{a] form a dense subset of the complex plane. 
. Prove that 7 + V2 and V3 + V-5 are algebraic numbers. 

. Prove that for all integers n, cos(27r/n) is an algebraic number. 

. Let Q[a, B] denote the smallest subring of C containing Q, a = V2, and B = V3, 


and let y = a + B. Prove that Q[a, B] = Q[y]. 


. Let S be a subring of R which is a discrete set in the sense of Chapter 5 (4.3). Prove that 


S=Z. 


. In each case, decide whether or not S is a subring of R. 


(a) S is the set of all rational numbers of the form a/b, where 6 is not divisible by 3, and 
R= Q. 

(b) S is the set of functions which are linear combinations of the functions 
{1,cos nt, sin nt | n € Z}, and R is the set of all functions R——> R. 


, P 4 : a b P 
c) (not commutative) S is the set of reai matrices of the form , and R is the set 
=|) Wi 


of all real 2 X 2 matrices. 


. In each case, decide whether the given structure forms a ring. If it is not a ring, deter- 


mine which of the ring axioms hold and which fail: 

(a) Uis an arbitrary set, and R is the set of subsets of U. Addition and multiplication of 
elements of R are defined by the rules A + B= A UBandA-B=ANB. 

(b) U is an arbitrary set, and R is the set of subsets of U. Addition and multiplication of 
elements of R are defined by the rules A + B = (A U B) —- (AB) and 
A-B=ANB. 

(c) R is the set of continuous functions R——>R. Addition and multiplication are 
defined by the rules [f + g](x) = f(x) + g(x) and [fog](x) = f(g(x)). 

Determine all rings which contain the zero ring as a subring. 
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11. Describe the group of units in each ring. 
(a) Z/12Z (b) Z/7Z (ec) Z/8Z (d) Z/nZ 

12. Prove that the units in the ring of Gauss integers are {+1, +i}. 

13. An element x of a ring R is called nilpotent if some power of x is zero. Prove that if x is 
nilpotent, then | + x is a unit in R. 

14. Prove that the product set R x R’ of two rings is a ring with component-wise addition 
and multiplication: 


(a,a') + (b,b') = (a + ba’ + b’) and (a,a')(b,b') = (ab,a'b’). 


This ring is called the product ring. 


2. Formal Construction of Integers and Polynomials 


1. Prove that every natural number n except | has the form m’ for some natural number m. 


2. Prove the following laws for the natural numbers. 
(a) the commutative law for addition 
(b) the associative law for multiplication 
(c) the distributive law 
(d) the cancellation law for addition: ifa + b= a+ c,thenb=c 
“(e) the cancellation law for multiplication: if ab = ac, then b = c 
3. The relation < on N can be defined by the rule a < bif b = a + n for some n. Assume 
that the elementary properties of addition have been proved. 
(a) Prove that ifa <b, thena+n <b +n forall n. 
(b) Prove that the relation < is transitive. 
(c) Prove that if a,b are natural numbers, then precisely one of the following holds: 


a<x<b,a=b.b <a. 


(d) Prove that ifn # 1, thena < an. 


4. Prove the principle of complete induction: Let S be a subset of N’ with the following 
property: If n is a natural number such that m € S for every m <n, thenn € S. Then 
S=N. 

*5. Define the set Z of all integers, using two copies of N and an element representing zero, 
define addition and multiplication, and derive the fact that Z is a ring from the properties 
of addition and multiplication of natural numbers. 

6. Let R be a ring. The set of all formal power series p(t) = ay + a,t + aot? + «+, with 
a; © R, forms a ring which is usually denoted by R[[r]]. (By formal power series we 
mean that there is no requirement of convergence.) 

(a) Prove that the formal power series form a ring. 
(b) Prove that a power series p(t) is invertible if and only if ao is a unit of R. 


7. Prove that the units of the polynomial ring R[x] are the nonzero constant polynomials. 


3. Homomorphisms and Ideals 


1. Show that the inverse of a ring isomorphism ~: R——> R' is an isomorphism. 
2. Prove or disprove: If an ideal / contains a unit, then it is the unit ideal. 
3. For which integers n does x? + x + | divide x* + 3x? + x? + 6x + 10 in Z/nZ[x}? 
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4) 


16. 


17. 


Prove that in the ring Z[x], (2) N (x) = (2x). 


- Prove the equivalence of the two definitions (3.11) and (3.12) of an ideal. 
. Is the set of polynomials a,x" + a,—;x"~'! + ++» + ayx + ao such that 2+! divides a, 


an ideal in Z[x]? 


- Prove that every nonzero ideal in the ring of Gauss integers contains a nonzero integer. 
. Describe the kernel of the following maps. 


(a) R[x, y]——R defined by f(x, y) ~~» (0, 0) 
(b) R[x]—— C defomed by f(x) ~~~» f(2 + i) 


. Describe the kernel of the map Z[x]——> R defined by fx)» fll + V2). 
. Describe the kernel of the homomorphism ¢: C[{x, y,z]——> C[r] defined by g(x) = ¢, 


ely) = 27, p(z) = 0. 


. (a) Prove that the kernel of the homomorphism g: C[x,y]——>C[t] defined by 


xe t?, yaw» ft? is the principal ideal generated by the polynomial y? — x?. 
(b) Determine the image of ¢ explicitly. 


. Prove the existence of the homomorphism (3.8). 
. State and prove an analogue of (3.8) when R is replaced by an arbitrary infinite field. 
. Prove that if two rings R,R' are isomorphic, so are the polynomial rings R[x] and 


R fl. 


. Let R be a ring, and let f(y) © R[y] be a polynomial in one variable with coefficients in 


R. Prove that the map R[x, y]—> R[x, y] defined by x~~~ x + f(y), yoy is an au- 
tomorphism of R[x, y]. 

Prove that a polynomial f(x) = Zaix' can be expanded in powers of x — a: 
f(x) = Xe;(x — a)', and that the coefficients c; are polynomials in the coefficients a;, 
with integer coefficients. 

Let R,R' be rings, and let R x R’ be their product. Which of the following maps are 
ring homomorphisms? 

(a) R—> RX R',_ osr~w»(r, 0) 

(b) R— RXR, ror(r,r) 

(c) RX R'——R,  (rn,r)wrn 

(d) RX R—R, (n,m) ~~ nin 

(e) Rx R—R, (r},r2)~wnr ry + ry 


. (a) Is Z/(10) isomorphic to Z/(2) x Z/(5)? 


(b) Is Z/(8) isomorphic to Z/(2) x Z/(4)? 


. Let R be a ring of characteristic p. Prove that the map R——> R defined by x ~~~ x? is a 


ring homomorphism. This map is called the Frobenius homomorphism. 


. Determine all automorphisms of the ring Z[x]. 

. Prove that the map Z—— R (3.9) is compatible with multiplication of positive integers. 
. Prove that the characteristic of a field is either zero or a prime integer. 

. Let R be a ring of characteristic p. Prove that if a is nilpotent then | + a is unipotent, 


that is, some power of 1 + a is equal to 1. 


. (a) The nilradical N of a ring R is the set of its nilpotent elements. Prove that N is an 


ideal. 
(b) Determine the nilradicals of the rings Z/(12), Z/(n), and Z. 


. (a) Prove Corollary (3.20). 


(b) Prove Corollary (3.22). 
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Determine all ideals of the ring R[[7]] of formal power series with real coefficients. 
Find an ideal in the polynomial ring F[x, y] in two variables which is not principal. 

Let R be aring, and let / be an ideal of the polynomial ring R[x]. Suppose that the lowest 
degree of a nonzero element of / is n and that / contains a monic polynomial of degree n. 
Prove that J is a principal ideal. 

Let J, J be ideals of a ring R. Show by example that J U J need not be an ideal, but 
show that] + J ={r ER|r=x+y, withx €/,y € J} is an ideal. This ideal is 
called the sum of the ideals /, J. 


. (a) Let /, J be ideals of a ring R. Prove that / M J is an ideal. 


(b) Show by example that the set of products {xy | x € /,y © J} need not be an ideal, 
but that the set of finite sums > x,y, of products of elements of / and J is an ideal. 
This ideal is called the product ideal. 

(c) Prove thatl) CIN J. 

(d) Show by example that JJ and J ™ J need not be equal. 


. Let 1,J,J' be ideals in a ring R. Is it true that /(J + J’) = + 1J'? 
. If R is a noncommutative ring, the definition of an ideal is a set J which is closed under 


addition and such that if r © R and x € /, then both rx and xr are in J. Show that the 
noncommutative ring of n X n real matrices has no proper ideal. 


. Prove or disprove: If a? = a for all a in a ring R, then R has characteristic 2. 
. Anelement e of a ring S is called idempotent if e? = e. Note that in a product R X R’ of 


rings, the element e = (1,0) is idempotent. The object of this exercise is to prove a 

converse. 

(a) Prove that if e is idempotent, then e’ = 1 — e is also idempotent. 

(b) Let e be an idempotent element of a ring S. Prove that the principal ideal eS is a 
ring, with identity element e. It will probably not be a subring of S because it will 
not contain | unless e = 1. 

(c) Let e be idempotent, and let e' = 1 — e. Prove that S is isomorphic to the product 
ring (eS) X (e’S). 


4. Quotient Rings and Relations in a Ring 


Ie 


Prove that the image of the homomorphism ¢ of Proposition (4.9) is the subring de- 
scribed in the proposition. 


. Determine the structure of the ring Z[x]/(x? + 3,p), where (a) p = 3, (b) p = S. 
- Describe each of the following nngs. 


(a) Z{x]/@? — 3,2x +4) (b) Zil/(2 + i) 


. Prove Proposition (4.2). 
. Let R’ be obtained from a ring R by introducing the relation a = 0, and let wy: R——> R' 


be the canonical map. Prove the following universal property for this construction: Let 
gy: R——>R bea ring homomorphism, and assume that ¢(a) = 0 in R. There is a unique 
homomorphism »’: R’——> R such that g' p=. 


- Let /, J be ideals in a ring R. Prove that the residue of any element of J N J in R/IJ is 


nilpotent. 


- Let /,J be ideals of a ring R such that] + J = R. 


(a) Prove that /J =1 J. 


Chapter 10 Exercises 383 


&. 


5; 


11. 


12. 
13. 


14. 


1S; 


*(b) Prove the Chinese Remainder Theorem: For any pair a, b of elements of R, there is 
an element x such that x = a (modulo /) and x = b (modulo J). [The notation 
x =a (modulo /) means x ~— a € /.] 
Let I, J be ideals of a ring R such that/ + J = Rand lJ = 0. 
(a) Prove that R is isomorphic to the product (R//) x (R/J). 
(b) Describe the idempotents corresponding to this product decomposition (see exercise 
34, Section 3). 


Adjunction of Elements 


. Describe the ring obtained from Z by adjoining an element @ satisfying the two relations 


2a —6=Oanda — 10=0. 


. Suppose we adjoin an element a@ to R satisfying the relation a? = 1. Prove that the re- 


sulting ring is isomorphic to the product ring R XR, and find the element of R x R 
which corresponds to a. 


. Describe the ring obtained from the product ring R xX R by inverting the element (2, 0). 
. Prove that the elements 1,f — a,(t — a)’,...,(¢ —a@)""' form a C-basis for 


C[t]/((t — a)"). 


. Let @ denote the residue of x in the ring R’ = Z[x]/(x* + x? + x? + x + 1). Compute 


the expressions for (a* + a? + a)(a + 1) and a in terms of the basis (1,a,a@?, a’, a‘). 


. In each case, describe the ring obtained from F, by adjoining an element a satisfying the 


given relation. 
(a) a? +a +1=0 (b) a’? +1=0 


. Analyze the ring obtained from Z by adjoining an element a@ which satisfies the pair of 


relations a? + a? + 1 =Oanda’? +a = 0. 


. Leta € R. If we adjoin an element a@ with the relation a = a, we expect to get back a 


ring: tsomorphic to R. Prove that this is so. 


. Describe the ring obtained from Z/12Z by adjoining an inverse of 2. 
10. 


Determine the structure of the ring R’ obtained from Z by adjoining element a satisfy- 

ing each set of relations. 

(a) 2a = 6, 6a = 15 (b) 2a = 6, 6a = 18 (Cc) 2a = 6, 6a = 8 

Let R = Z/(10). Determine the structure of the ring obtained by adjoining an element a 

satisfying each relation. 

(a) 2a —6 — O09 {b) 2a ~5 =0 

Let a be a unit in a ring R. Describe the ring R’ = R[x]/(ax — 1). 

(a) Prove that the ring obtained by inverting x in the polynomial ring R[x] is isomorphic 
to the ring of Laurent polynomials, as asserted in (5.9). 


(b) Do the formal Laurent series > anx” form a ring? 


Let a be an element of a ring R, and let R’ = R[x]/(ax — 1) be the ring obtained by ad- 
joining an inverse of a to R. Prove that the kernel of the map R—— R’ is the set of ele- 
ment: b € R such that a"b = 0 for some n > 0. 

Let a be an clement of a ring R, and. let R' be the ring obtained from R by adjoining an 
inverse of a. Prove that R’ is the zero ring if and only if a is nilpotent. 


384 


16. 


17. 


Rings Chapter 10 


Let F be a field. Prove that the rings F[x]/(x?) and F[x]/(x — 1)? are isomorphic if and 
only if F has characteristic 2. 

Let R = Z[x]/(2x). Prove that every element of R has a unique expression in the form 
do + ax + ++ + anx”, where aj; are integers and a;,...,@n are either O or 1. 


6. Integral Domains and Fraction Fields 


Ber = 


*9; 


. Prove that a subring of an integral domain is an integral domain. 
. Prove that an integral domain with finitely many elements is a field. 


Let R be an integral domain. Prove that the polynomial ring R[x] is an integral domain. 


Let R be an integral domain. Prove that the invertible elements of the polynomial ring 
R[x] are the units in R. 
Is there an intégral domain containing exactly 10 elements? 


. Prove that the field of fractions of the formal power series ring F[[x]] over a field F is 


obtained by inverting the single element x, and describe the elements of this field as cer- 
tain power series with negative exponents. 


. Carry out the verification that the equivalence classes of fractions from an integral do- 


main form a field. 


. A semigroup S is a set with an associative law of composition having an identity ele- 


ment. Let S be a commutative semigroup which satisfies the cancellation law: ab = ac 
implies b = c. Use fractions to prove that S can be embedded into a group. 


A subset S of an integral domain R which is closed under multiplication and which does 
not contain 0 is called a multiplicative set. Given a multiplicative set S$, we define S- 
fractions to be elements of the form a/b, where b € S. Show that the equivalence 
classes of S-fractions form a ring. 


7. Maximal Ideals 


. Prove that the maximal ideals of the ring of integers are the principal ideals generated by 


prime integers. 


. Determine the maximal ideals of each of the following. 


(a) RXR (b) RExJ/(Qxe?)_ © Rix]/(x? — 3x + 2) (d) Rix)/(x? + x + 1) 


. Prove that the ideal (x + y?,y + x? + 2xy? + y*) in C[x, y] is a maximal ideal. 
. Let R be a ring, and let / be an ideal of R. Let M be an ideal of R containing /, and let 


M = M/I be the corresponding ideal of R. Prove that M is maximal if and only if M is. 


. Let / be the principal ideal of C[x, y] generated by the polynomial y? + x? — 17. Which 


of the following sets generate maximal ideals in the quotient ring R = Cx, y]//? 
(a) @— 1,y- 4) (b) @+1,y +4) © G@ — 17,y7) 


. Prove that the ring Fs[x]/(x? + x + 1) is a field. 
. Prove that the ring F[x]/(x* + x + 1) is a field, but that F3[x]/(x? + x + 1) is not a 


field. 


. Let R = C[x,,..., xn]// be a quotient of a polynomial ring over C, and let M be a maxi- 


mal ideal of R. Prove that R/M ~ C. 


- Define a bijective correspondence between maximal ideals of R[x] and points in the up- 


per half plane. 
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Let R be a ring, with M an ideal of R. Suppose that every element of R which is not in M 
isa unit of R. Prove that M is a maximal ideal and that moreover it is the only maximal 
ideal of R. 
Let P be an ideal of a ring R. Prove that R = R/P is an integral domain if and only if 
P # R, and that if a,b © R and ab € P, thena € P orb € P. (An ideal P satisfying 
these conditions is called a prime ideal.) 
Let g¢: R—~R’ be a ring homomorphism, and let P’ be a prime ideal of R’. 
(a) Prove that y '(P’) is a prime ideal of R. 
(b) Give an example in which P’ is a maximal ideal, but g"'(P’) is not maximal. 
Let R be an integral domain with fraction field F, and let P be a prime ideal of R. Let Rp 
be the subset of F defined by 

R, = {a/d|a,d € R,d € P}. 
This subset is called the Jocalization of R at P. 
(a) Prove that R, is a subring of F. 
(b) Determine all maximal ideals of Rp. 
Find an example of a “ring without unit element” and an ideal not contained in a maxi- 
mal ideal. 


8. Algebraic Geometry 


1. 


Determine the points of intersection of the two complex plane curves in each of the 
following. 

(a) —-xetx?=1, xt+y=!1 

er ay t+ v2 = 1, x2 + 2y?= 1 

(c) yy=x?, xy=1 

(d)x+y+y?=0, x-yty?=0 

(elk by? "Gay t x? +.2xy + y* = 0 


. Prove that two quadratic polynomials f, g in two variables have at most four common ze- 


ros, unless they have a nonconstant factor in common. 
Derive the Hilbert Nullstellensatz from its classical form (8.7). 
Let U, V be varieties in C”. Prove that U U V and U Nf V are varieties. 


Wet fi,.. 4 fromein-... PMOLX),:.., Xn] men Mee ULV beste -zerasof {1 ;...: fr}, 


{g1,..-, @s} respectively. Prove that if U and V do not meet, then (fj,..., fr; 81,---, 8s) is 
the unit ideal. 

Let f = fi-+: fm and g = 1° gn, where fj, gj are irreducible polynomials in C[x, y]. 
Let S; = {f; = 0} and 7; = {g; = 0} be the Riemann surfaces defined by these polyno- 
mials, and let V be the variety f = g = 0. Describe V in terms of S;, 7;. 


. Prove that the variety defined by a set {f,,..., f-} of polynomials depends only on the 


ideal (fi,..., f-) they generate. 

Let R be a ring containing C as subring. 

(a) Show how to make R into a vector space over C. 

(b) Assume that R is a finite-dimensional vector space over C and that R contains ex- 
actly one maximal ideal M. Prove that M is the nilradical of R, that is, that M con- 
sists precisely of its nilpotent elements. 

Prove that the complex conic xy = | is homeomorphic to the plane, with one point 

deleted. 
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Prove that every variety in C°* is the union of finitely many points and algebraic curves. 

The three polynomials f, = x* + y? — 1, ff = x7 — y + I, and f; = xy — | generate 

the unit ideal in C[x, y]. Prove this in two ways: (i) by showing that they have no com- 

mon zeros, and (ii) by writing | as a linear combination of f|, 2. fa, with polynomial 

coefficients. 

(a) Determine the points of intersection of the algebraic curve S: y° = x* — x° and the 
line L: y = Ax. 

(b) Parametrize the points of S as a function of A. 

(c) Relate S to the complex A-plane, using this parametrization. 

The radical of an ideal / is the set of elements r € R such that some power of r is in /. 

(a) Prove that the radical of / is an ideal. 

(b) Prove that the varieties defined by two sets of polynomials {fi,..., fr}. {91.---. &} 
are equal if and only if the two ideals (fi,..., f-), (gi,---, 8s) have the same radicals. 

Let R = C[x,...,xn]/(fi,.--, fn). Let A be a ring containing C as subring. Find a bijec- 

tive correspondence between the following sets: 

(i) homomorphisms gy: R—— A which restrict to the identity on C, and 

(ii) n-tuples a = (a),...,a@n) of elements of A which solve the system of equations 
fi =... = fm = 0, that is, such that fi(a) = 0 fori = 1,...,m. 


Miscellaneous Exercises 


i 


6. 


Let F be a field, and let K denote the vector space F*. Define multiplication by the rules 

(a), a2) - (by, bz) = (arb) — arb2, a,b. + arb). 

(a) Prove that this law and vector addition make K into a ring. 

(b) Prove that K is a field if and only if there is no element in F whose square is —1. 

(c) Assume that —1 is a square in F and that F does not have characteristic 2. Prove that 
K is isomorphic to the product ring F X F. 


. (a) We can define the derivative of an arbitrary polynomial f(x) with coefficients in a 


ring R by the calculus formula (a,x" + +++ + ayx + ao)! = nanx"™' + ++ + lay. 
The integer coefficients are interpreted in R using the homomorphism (3.9). Prove 
the product formula ( fg)’ = f'g + fg' and the chain rule (f° g)' = (f' © g)g’. 

(b) Let f(x) be a polynomial with coefficients in a field F, and let a be an element of F. 
Prove that @ is a multiple root of f if and only if it is a common root of f and of its 
derivative f’. 

(c) Let F = Fs. Determine whether or not the following polynomials have multiple roots 
Te Gan Salient GRO Ga oom PP a Nh 


- Let R be a set with two laws of composition satisfying all the ring axioms except the 


commutative law for addition. Prove that this law holds by expanding the product 
(a + b)(c + d) in two ways using the distributive law. 


- Let R be a ring. Determine the units in the polynomial ring R[x]. 
. Let R denote the set of sequences a = (a,, a2, a3,...) of real numbers which are eventu- 


ally constant: dn = aj+, = ... for sufficiently large n. Addition and multiplication are 

component-wise; that is, addition is vector addition and ab = (a,b,, azbp,...). 

(a) Prove that R is a ring. 

(b) Determine the maximal ideals of R. 

(a) Classify rings R which contain C and have dimension 2 as vector space over C. 
*(b) Do the same as (a) for dimension 3. 
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*7. Consider the map ¢: C[x,y]—>€[x]xC[y] x C[r] defined by f(x,y)» 
(f(x, 0), f(0, y), f(t. 1)). Determine the image of ¢ explicitly. 

8. Let S be a subring of a ring R. The conductor C of S in R is the set of elements a © R 
such that @R C S. 

(a) Prove that C is an ideal of R and also an ideal of S. 
(b) Prove that C is the largest ideal of 5 which is also an ideal of R. 
(c) Determine the conductor in each of the following three cases: 
(i) R=C{(t], S = C[r’, 7]; 
Gi) R= Zl, = 43(-1 + V-3), S= AV-3]; 
(ii) R= Cir, c'], S = Cle]. 

9. A line in C? is the locus of a linear equation L: {ax + by + c = O}. Prove that there is a 
unique line through two points (xv, vo), (1,1), and also that there is a unique line 
through a point (xo. yo) with a given tangent direction (uo, vo). 

10. An algebraic curve C in C* is called irreducible if it is the locus of zeros of an irreducible 
polynomial f(x, ¥)—one which can not be factored as a product of nonconstant polyno- 
mials. A point p € C is called a singular point of the curve if df/dx = af/dy = Oat p. 
Otherwise p is a nonsingular point. Prove that an irreducible curve has only finitely 
many singular points. 

11. Let L: ax + by + c = 0 be a line and C: {f = 0} a curve in C?. Assume that b # 0. 
Then we can use the equation of the line to eliminate y from the equation f(x, y) = 0 of 
C, obtaining a polynomial g(x) in x. Show that its roots are the x-coordinates of the in- 
tersection points. 

12. With the notation as in the preceding problem, the multiplicity of intersection of L and C 
at a point p = (xo, Yo) is the multiplicity of x) as a root of g(x). The line is called a tan- 
gent line to C at p if the multiplicity of intersection is at least 2. Show that if p is a non- 
singular point of C, then there is a unique tangent line at (x0, yo), and compute it. 

13. Show that if p is a singular point of a curve C, then the multiplicity of intersection of ev- 
ery line through p is at least 2.’ 

14. The degree of an irreducible curve C: {f = 0} is defined to be the degree of the irre- 
ducible polynomial f. 

(a) Prove that a line L meets C in at most d points, unless C = L. 
(b) Prove that there exist lines which meet C in precisely d points. 


15. Determine the singular points of x* + y* — 3xy = 0. 
*16. Prove that an irreducible cubic curve can have at most one singular point. 
*17, A nonsingular point p of a curve C is called a flex point if the tangent line L to C at p has 


an intersection of multiplicity at least 3 with C at p. 
(a) Prove that the flex points are the nonsingular points of C at which the Hessian 


of ef af 
ax?” oxdy ax 
oe “iy 0 
det z LES a 
Oxdy dy oy 
0 0 
af fy 
Ox oy 


vanishes. 
(b) Determine the flex points of the cubic curves y? — x° and y? — x* + x’. 
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*18. Let C be an irreducible cubic curve, and let L be a line joining two flex points of C. 
Prove that if L meets C in a third point, then that point is also a flex. 
19. Let U = {filx,...,Xm) = O}, V = {2,(y1,....¥n) = O} be two varieties. Show that the 
variety defined by the equations { f(x) = 0, gy) = 0} in C”*” is the product set U x V. 
20. Prove that the locus y = sin x in R? doesn’t lie on any algebraic curve. 
*21. Let f,g be polynomials in C[x, y] with no common factor. Prove that the ring R = 
C[x, y]/(f, g) is a finite-dimensional vector space over C. 
22. (a) Let s,c denote the functions sin x, cos x on the real line. Prove that the ring R[s, c] 
they generate is an integral domain. 
(b) Let K = R(s,c) denote the field of fractions of R[s,c]. Prove that the field K is iso- 
morphic to the field of rational functions R(x). 
*23. Let f(x), g(x) be polynomials with coefficients in a ring R with f # 0. Prove that if the 
product f(x)g(x) is zero, then there is a nonzero element c € R such that cg(x) = 0. 
*24. Let X denote the closed unit interval {0, 1], and let R be the ring of continuous functions 


X——R. 
(a) Prove that a function f which does not vanish at any point of X is invertible in R. 
(b) Let f,..., fr be functions with no common zero on X. Prove that the ideal generated 


by these functions is the unit ideal. (Hint: Consider f,;? + --- + f,7.) 

(c) Establish a bijective correspondence between maximal ideals of R and points on the 
interval. 

(d) Prove that the maximal ideals containing a function f correspond to points of the in- 
terval at which f = 0. 

(e) Generalize these results to functions on an arbitrary compact set X in R*. 

(f) Describe the situation in the case X = R. 
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Rien n’ est beau que le vrai. 


Hermann Minkowski 


I, FACTORIZATION OF INTEGERS AND POLYNOMIALS 


This chapter is a study of division in rings. Because it is modeled on properties of 
the ring of integers, we will begin by reviewing these properties. Some have been 
used without comment in earlier chapters of the book, and some have already been 
proved. 

The property from which all others follow is division with remainder: If a,b 
are integers and a # 0, there exist integers q,r so that 
(1.1) =aqtr, 
and 0 < r < |a|. This property is often stated only for positive integers, but we al- 
low a and b to take on negative values too. That is why we use the absolute value | a| 
to bound the remainder. The proof of the existence of (1.1) is a simple induction ar- 
gument. 

We've already seen some of the most important consequences of division with 
remainder, but let us recall them. In Chapter 10, we saw that every subgroup of Z* 
is an ideal and that every ideal of Z is principal, that is, it has the form dZ for some 
integer d = 0. As was proved in Chapter 2 (2.6), this implies that a greatest com- 
mon divisor of a pair of integers a, b exists and that it is an integer linear combina- 
tion of a and b. If a and b have no factor in common other than +1, then | is a lin- 
ear combination of a and b with integer coefficients: 


(1.2) ra + sb = 1, 


for some r,s € Z. This implies the fundamental property of prime integers, which 


was proved in Chapter 3 (2.8). We restate it here: 
389 
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(1.3) Proposition. Let p be a prime integer, and let a, b be integers. If p divides 
the product ab, then p divides a or b. 0 


(4) Theorem. Fundamental Theorem of Arithmetic: Every integer a # 0 can be 
written as a produc 


a = Cpi"** Pk, 


where c = +1, the p; are positive prime integers, and k = 0. This expression is 
unique except for the ordering of the prime factors. 
—— 


Proof. First, a prime factorization exists. To prove this, it is enough to con- 
sider the case that a is greater than 1. By induction on a, we may assume the exis- 
tence proved for all positive integers b < a. Either a is prime, in which case the 
product has one factor, or there is a proper divisor b # a. Then a = bb’ and also 
b' # a. Both b and b’ are smaller than a, and by induction they can be factored 
into primes. Setting their factorizations side by side gives a factorization of a. 

Second, the factorization is unique. Suppose that 


=i Sih =q= tq ors Fe 
The signs certainly agree. We apply (1.3), with p = p,. Since p; divides the product 
qi‘**Gm, it divides some qi, say qg:. Since q; iS prime, p; = q:. Cancel p, and pro- 
ceed by induction. o 


The structure of the ring of integers is closely analogous to that of a polyno- 
mial ring F[x] in one variable over a field. Whenever a property of one of these 
rings is derived, we should try to find an analogous property of the other. We have 
already discussed division with remainder for polynomials in Chapter 10, and we 
have seen that every ideal of the polynomial ring F[x] is principal [Chapter 10 
(221) 1; 

A polynomial p(x) with coefficients in a field F is called irreducible if it is not 
constant and if its only divisors of lower degree in F[x] are constants. This means 
that the only way that p can be written as a product of two polynomials is p = cp, 
where c is a constant and p; is a constant multiple of p. The irreducible polynomials 
are analogous to prime integers. It is customary to normalize them by factoring out 
their leading coefficients, so that they become monic. 

The proof of the following theorem is similar to the proof of the analogous 
statements for the ring of integers: 


(1.5) Theorem. Let F be a field, and let F[x] denote the polynomial ring in one 
variable over F. 


(a) If two polynomials f, g have no common nonconstant factor, then there are 
polynomials r,s © F[x] such that rf + sg = 1. 

(b) If an irreducible polynomial p € F[x] divides a product fg, then p divides one 
of the factors f or g. 
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(c) Every nonzero polynomial f € F[x] can be written as a product 


ia = cp sae 2 


where c is a nonzero constant, the p; are monic irreducible polynomials in 
F(x] and k = 0. This factorization is unique, except for the ordering of the 
terms. o 


The constant factor c which appears in the third part of this theorem is 
analogous to the factor +I in (1.4). These are the units in their respective rings. The 
unit factors are there because we normalized primes to be positive, and irreducible 
polynomials to be monic. We can allow negative primes or nonmonic irreducible 
polynomials if we wish. The unit factor can then be absorbed, if k > 0. But this 
complicates the statement of uniqueness slightly. 


(1.6) Examples. Over the complex numbers, every polynomial of positive degree 
has a root a and therefore has a divisor of the form x — a. So the irreducible poly- 
nomials are linear, and the irreducible factorization of a polynomial has the form 


(i.2) F(x) = ex — a) (x — ap), 


where a, are the roots of f(x), repeated as necessary. The uniqueness of this factor- 
ization is not surprising. 

When F = R, there are two classes of irreducible polynomials: linear polyno- 
mials and irreducible quadratic polynomials. A real quadratic polynomial 
x° + bx + c is irreducible if and only if its discriminant b* — 4c is negative, in 
which case it has a pair of complex conjugate roots. The fact that every irreducible 
polynomial over the complex numbers is linear implies that no higher-degree poly- 
nomial is irreducible over the reals. Suppose that a polynomial f(x) has real 
coefficients a, and that a is a complex, nonreal root of f(x). Then the complex con- 
jugate @ is different from a and is also a root. For, since f is a real polynomial, its 
coefficients a; satisfy the relation a; = a;. Then 


f(@) = ana” + + + VG + ao = Gra" + + HA + = fa) =0=0. 


The quadratic polynomial g(x) = (x — a)(x — @) = x? — (a + @)x + a@ has 
real coefficients -(a + @) and aa, and both of its linear factors appear on the right 
side of the complex factorization (1.7) of f(x). Thus g(x) divides f(x). So the factor- 
ization of f(x) into irreducible real polynomials is obtained by grouping conjugate 
pairs in the complex factorization. a 


Factorization of polynomials is more complicated for polynomials with rational 
coefficients than for real or complex polynomials, because there exist irreducible 
polynomials in Q[x] of arbitrary degree. For example, x° — 3x* + 3 is irreducible 
in Q[.x]. We will see more examples in Section 4. Neither the form of the irreducible 
factorization nor its uniqueness is intuitively clear for rational polynomials. 

For future reference, we note the following elementary fact: 
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(1.8) Proposition. Let F be a field, and let f(x) be a polynomial of degree n with 
coefficients in F. Then f has at most n roots in F. 


Proof. Anelement a € F is a root of f if and only if x — a@ divides f [Chapter 
10 (3.20)]. If so, then we can write (x) = (x — a)q(x), where q(x) is a polynomial 
of degree n — 1. If B is another root of f, then f(B) = (B — a)q(B) = 0. Since F 
is a field, the product of nonzero elements of F is not zero. So one of the two ele- 
ments B — a, qg(f) is zero. In the first case 8 = a, and in the second case B is one 
of the roots of g(x). By induction on n, we may assume that g(x) has at most n — | 
roots in F. Then there are at most 7 possibilities for B. o 


The fact that F is a field is crucial to Theorem (1.5) and to Proposition (1.8), 
as the following example shows. Let R be the ring Z/8Z. Then in the polynomial 
ring R[x], we have 


x7-1=(@+ 1I@ -— 1 = @ + 3)@ — 3). 


The polynomial x* — | has four roots modulo 8, and its factorization into irre- 
ducible polynomials is not unique. 


2. UNIQUE FACTORIZATION DOMAINS, PRINCIPAL IDEAL DOMAINS, 
AND EUCLIDEAN DOMAINS 


Having seen that factorization of polynomials is analogous to factorization of in- 
tegers, it is natural to ask whether other rings can have such properties. Relatively 
few such rings exist, but the ring of Gauss integers is one interesting example. This 
section explores ways in which various parts of the theory can be extended. 

We begin by introducing the terminology used in studying factorization. It is 
natural to assume that the given ring R is an integral domain, so that the Cancella- 
tion Law is available, and we will make this assumption throughout. We say that an 
element a divides another element b (abbreviated a | b) if b = ag for some g € R. 
The element a is a proper divisor of b if b = aq for some gq € R and if neither a nor 
q is aunit. A nonzero element a of R is called irreducible if it is not a unit and if it 
has no proper divisor. Two elements a,a’ are called associates if each divides the 
other. It is easily seen that a, a’ are associates if and only if they differ by a unit fac- 
tor, that is, if a’ = ua for some unit u. 

The concepts of divisor, unit, and associate can be interpreted in terms of the 
principal ideals generated by the elerhents. Remember that an ideal / is called princi- 
pal if it is generated by a single element: 


(2.1) I = (a). 


Keep in mind the fact that (a) consists of all elements which are multiples of a, that 
is, which are divisible by a. Then 
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(22) uis aunit © (u) = (1) 
a and a’ are associates © (a) = (a') 
a divides b © (a) D (b) 
a is a proper divisor of b © (1) > (a) > (b). 


The proof of these equivalences is straightforward, and we omit it. 

Now suppose that we hope for a theorem analogous to the Fundamental Theo- 
rem of Arithmetic in an integral domain R. We may divide the statement of the the- 
orem into two parts. First, a given element a must be a product of irreducible ele- 
ments, and second, this product must be essentially unique. 

Consider the first part. We assume that our element a is not zero and not a unit; 
otherwise we have no hope of writing it as a product of irreducible elements. Then 
we attempt to factor a, proceeding as follows: If a is irreducible itself, we are done. 
If not, then a has a proper factor, so it decomposes in some way as a product, 
a = a,b, where neither a, nor 5, is a unit. We continue factoring a; and b, if possi- 
ble, and we hope that this procedure terminates; in other words, we hope that after a 
finite number of steps all the factors are irreducible. The condition that this proce- 
dure always terminates has a neat description in terms of principal ideals: 


(2.3) Proposition. Let R be an integral domain. The following conditions are 
equivalent: 


(a) For every nonzero element a of R which is not a unit, the process of factoring 
a terminates after finitely many steps and results in a factorization a = b, --- by 
of a into irreducible elements of R. 


(b) R does not contain an infinite increasing chain of principal ideals 
(a1) < (a2) < (a) <.... 


Proof. Suppose that R contains an _ infinite increasing sequence 
(a;) < (a2) < .... Then (an) < (1) for every n, because (an) < (an+1) C (1). Since 
(@n—1) < (Qn), Qn is a proper divisor Of Gn—1, SAY Gn—1 = Anbn where ap,bn are not 
units. This provides a nonterminating sequence of factorizations of a): a, = dob. = 
a;b3b. = dsbab3b,.... Conversely, such a sequence of factorizations gives us an in- 
creasing chain of ideals. o 


The second condition of this proposition is often called the ascending chain 
condition for principal ideals. However, to emphasize the factorization property, we 
will say that existence of factorizations holds in R if the equivalent conditions of the 


— a 


proposition are true. 


~—~Tt is easy to describe domains in which existence of factorizations fails. One ex- 
ample is obtained by adjoining all 2*-th roots of x, to the polynomial ring F[x,]: 


(2.4) R= F[x1, x2, %3,.--], 
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with the relations x2? = x1, X37 = x2, X47 = x3, and so on. We can factor the element 
x, indefinitely in this ring, and correspondingly there is an infinite chain 
(x1) < (x2) < ... of principal ideals. 

It turns out that we need infinitely many generators for a ring to make an ex- 
ample such as the one just given, so we will rarely encounter such rings. In practice, 
the second part of the Fundamental Theorem is the one which gives the most trou- 
ble. Factorization into irreducible elements will usually be possible, but it will not be 
unique. 

Units in a ring complicate the statement of uniqueness. It is clear that unit fac- 
tors should be disregarded, since there is no end to the possibility of adding unit fac- 
tors in pairs uu~'. For the same reason, associate factors should be considered equiv- 
alent. The units in the ring of integers are +1, and in this ring it was natural to 
normalize irreducible elements (primes) to be positive; similarly, we may normalize 
irreducible polynomials by insisting that they be monic. We don’t have a reasonable 
way to normalize elements of an arbitrary integral domain, so we will allow some 
ambiguity. It is actually neater to work with principal ideals than with elements: As- 
sociates generate the same principal ideal. However, it isn’t too cumbersome to use 
elements here, and we will stay with them. The importance of ideals will become 
clear in the later sections of this chapter. 

We will call an integral domain R a unique factorization domain if it has the 
following properties: ——— ee 


(2.5) 


(i) Existence of factorizations is true for R. In other words, the process of factor- 
ing a nonzero element a which is not a unit terminates after finitely many steps 
and yields a factorization a = p;-:: pm, where each p; is irreducible. 

(ii) The irreducible factorization of an element is unique in the following sense: If 
a is factored in two ways into irreducible elements, say a = pi-*: Dm = 
qi *** Qn, then m = n, and with suitable ordering of the factors, p; is an associ- 
ate of q; for each i. 


So in the statement of uniqueness, associate factorizations are considered equivalent. 
Here is an example in which uniqueness of factorization is not true. The ring is 
the integral domain 


(2.6) R= ZV -S5]. 
It consists of all complex numbers of the form a + bV-5, where a,b € Z. The 


units in this ring are +1, and the integer 6 has two essentially different factoriza- 
tions in R: 


(57) 6=2:-3 = (1+V-5)(1-V-5). 
It is not hard to show that all four terms 2,3, 1+V—-5,1—V-S are irreducible ele- 
ments of R. Since the units are +1, the associates of 2 are 2 and -2. So 2 is not an 


associate of 1+ V —5, which shows that the two factorizations are essentially differ- 
ent and hence that R is not a unique factorization domain. 


Section 2 Factorization, Principal Ideal and Euclidean Domains 395 


The crucial property of prime integers is that if a prime divides a product, it 
divides one of the factors. We will call an element p of an integral domain R prime if 
it has these properties: p is not zero and not a unit, and if p divides a product of ele- 
ments of R, it divides one of the factors. These are the properties from which 
uniqueness of the factorization is derived. 


(2.8) Proposition. Let R be an integral domain. Suppose that existence of factor- 
izations holds in R. Then R is a unique factorization domain if and only if every irre- 
ducible element is prime. 


The proof is a simple extension of the arguments used in (1.3) and (1.4); we leave it 
aS an exercise. o 


It is important to distinguish between the two concepts of irreducible 
element and prime element. They are equivalent in unique factorization domains, 
but most rings contain irreducible elements which are not prime. For instance, in 
the ring R = Z[WV-5] considered above, the element 2 has no proper factor. 
so it is irreducible. It is not prime because, though it divides the product 6 = 
(1 Voll 5), it does not divide either factor. 

Since irreducible elements in .a unique factorization domain are prime, the 
phrases irreducible factorization and prime factorization are synonymous. We can 
use them interchangeably when we are working in a unique factorization domain, 
but not otherwise. 

There is a simple way of deciding whether an element a divides another ele- 
ment b in a unique factorization domain, in terms of their irreducible (or prime) fac- 
torizations. 


(2.9) Proposition. Let R be a unique factorization domain, and let a = p,-*: pr, 
b = qi‘*: qs be given prime factorizations of two elements of R. Then a divides b in 
R if and only if s = r, and with a suitable ordering of the factors q; of b, p; is an 
associate of qi fori = 1,...,7.0 f 


(2.10) Corollary. Let R be a unique factorization domain, and let a, b be elements 
of R which are not both zero. There exists a greatest common divisor d of a,b, with 
the following properties: 


(i) d divides a and 5b; 
(ii) if an element e of R divides a and b, then e divides d. o 


It follows immediately from the second condition that any two greatest common di- 
visors of a,b are associates. However, the greatest common divisor need not have 
the form ra + sb. For example, we will show in the next section that the integer 
polynomial ring Z[x] is a unique factorization domain [see (3.8)]. In this ring, the 
elements 2 and x have greatest common divisor 1, but 1 is not a linear combination 
of these elements with integer polynomial coefficients. 
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Another important property of the ring of integers is that every ideal of Z is 
principal. An integral domain in which every ideal is principal is called a principal 
ideal domain. 


(2.11). Proposition. 


(a) In an integral domain, a prime element is irreducible. 
(b) In a principal ideal domain, an irreducible element is prime. 


We leave the proofs of (2.9-2.11) as exercises. o 


(2.12) Theorem. A principal ideal domain is a unique factorization domain. 


Proof. Suppose that R is a principal ideal domain. Then every irreducible ele- 
ment of R is prime. So according to Proposition (2.8), we need only prove the exis- 
tence of factorizations for R. By Proposition (2.3), this is equivalent to showing that 
R contains no infinite increasing chain of principal ideals. We argue by contradic- 
tion. Suppose that (a.)< (a3) < (a3) < ... is such a chain. 


(2.13) Lemma. Let R be any ring. The union of an increasing chain of ideals 
i, ei, Ck @ sis anideal, 


Proof. Let I denote the union of the chain. If u,v are in /, then they are in /, 
for some n. Then u + v and ru are also in /,; hence they are in /. o 


We apply this lemma to the union / of our chain of principal ideals and use the hy- 
pothesis that R is a principal ideal domain to conclude that / is principal, say 
I = (b). Now since b is in the union of the ideals (a,), it is in one of these ideals. 
But if b € (an), then (b) C (an), and on the other hand (an) C (an+1) C (b). There- 
fore (an) = (an+1) = (b). This contradicts the assumption that (an) < (an+1), and 
this contradiction completes the proof. o 


The converse of Theorem (2.12) is not true. The ring Z[x] of integer polyno- 
mials is a unique factorization domain [see (3.8)], but it is not a principal ideal do- 
main. 


(2.14) Proposition. 


(a) Let p be a nonzero element of a principal ideal domain R. Then R/(p) is a field 
if and only if p is irreducible. 


(b) The maximal ideals are the principal ideals generated by irreducible elements. 


Proof. Since an ideal M is maximal if and only if R/M is a field, the two parts 
are equivalent. We will prove the second part. A principal ideal (a) contains another 
principal ideal (b) if and only if a divides b. The only divisors of an irreducible ele- 
ment p are the units and the associates of p. Therefore the only principal ideals 
which contain (p) are (p) and (1). Since every ideal of R is principal, this shows that 
an irreducible element generates a maximal ideal. Conversely, let b be a polynomial 
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having a proper factorization b = aq, where neither a nor qg is a unit. Then 
(b) < (a) < (1), and this shows that (b) is not maximal. o 


Let us now abstract the procedure of division with remainder. To do so, we 
need a notion of size of an element of a ring. Appropriate measures are 


(215) absolute value, if R = Z, 
degree of a polynomial, if R = F[x], 
(absolute value)’, if R = Z[i]. 
In general, a size function on an integral domain R will be any function 
(2.16) a: R—{0}—— (0, 1, 2,...} 


from the set of nonzero elements of R to the nonnegative integers. An integral do- 
main R is a Euclidean domain if there is a size function o on R such that the division 
algorithm holds: 


(2.17) Let a,b € Rand suppose that a # 0. There are elements q,r © R 
such that b = aq + r, and either r = 0 or a(r) < a(a). 


We do not require the elements q, r to be uniquely determined by a and b. 
(2.18) Proposition. The rings Z, F[x], and Z[i] are Euclidean domains. o 


The ring of integers and the polynomial ring have already been discussed. Let us 
show that the ring of Gauss integers is a Euclidean domain, with size function the 
functiona =| |*. The elements of Z[i] form a square lattice in the complex plane, 
and the multiples of a given element a@ form a similar lattice, the ideal (a) = Ra. If 
we write a = re’®, then (a) is obtained by rotating through the angle @ followed by 
stretching by the factor r = |a|: 


(2.19) Figure. * = ideal (a), R = Z[i] 
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It is clear that for every complex number #, there is at least one point of the lattice 
(a) whose square distance from b is <+4]a|*. Let that point be aq, and set 
r = b — aq. Then |r|? = 4$|al? < |a|?, as required. Note that since there may be 
more than one choice for the element aq, this division with remainder is not unique. 
We could also proceed algebraically. We divide the complex number b by «a: 
= aw, where w = x + yi is a complex number, not necessarily a Gauss integer. 
Then we choose the nearest Gauss integer point (m,n) to (x,¥), writing x = 
m+ xo, Y = n+ yo, where m,n are integers and x), yo are real numbers such that 
-~4< x, yo <4. Then (m + ni)a is the required point of Ra. For, |x» + voi < 4 
and |b — (m + nidaP = |a(xo + yoi)? < 4lal? 
One can copy the discussion of factorization of integers with minor changes to 
prove this proposition: 


(2.20) Proposition. A Euclidean domain ts a principal ideal domain, and hence it 
is a unique factorization domain. o 


(2.21) Corollary. The rings Z, Z[i], and F[x] (F a field) are principal ideai do- 
mains and unique factorization domains. o 


In the ring Z[i] of Gauss integers, the element 3 is irreducible, hence prime. 
but 2 and 5 are not irreducible because 


(222) 2—()-inGh— idemand S =.(2 +4) G2). 


These are the prime factorizations of 2 and 5 in Z[i]. 

There are four units in the ring Z[i], namely {+1, +i}. So every nonzero ele- 
ment a@ of this ring has four associates, namely the elements +a, tia. The associ- 
ates of 2 + i, for example are 
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There is no really natural way to normalize primes in Z[i], though if pressed we 
would choose the unique associate lying in the first quadrant and not on the imagi- 
nary axis. It is better to accept the ambiguity of (2.5) here or else work with princi- 
pal ideals. 


3. GAUSS’S LEMMA 


Theorem (1.5) applies to the ring Q[x] of polynomials with rational coefficients: 
Every polynomial f(x) € Q[x] can be expressed uniquely in the form cp.--: px, 
where c € Q and p; are monic polynomials which are irreducible over Q. Now sup- 
pose that a polynomial f(x) has integer coefficients, f(x) € Z[x], and that it factors 
in Q[x]. Can it be factored without leaving Z[x]? We are going to prove that it can, 
and that Z[x] is a unique factorization domain. 

Here is an example of a prime factorization in Z[x]: 


6x? + Ox? + Ox + 3 = 3(2x + 1)(x? + x + 1). 


Section 3 Gauss’s Lemma 399 


As we see from this example, irreducible factorizations are slightly more compli- 
cated in Z[x] than in Q[x]. First, the prime integers are irreducible elements of 
Z[{x], so they may appear in the prime factorization of a polynomial. Second, the 
factor 2x + | isn’t monic. If we want to stay with integer coefficients, we can’t ask 
for monic factors. 

The integer factors of a polynomial f(x) = anx” + +++ + ao in Z[x] are com- 
mon divisors of its coefficients ao...., an. A polynomial f(x) is called primitive if its 
coefficients ao,.... a, have no common integer factor except for the units +1 and if 
its highest coefficient a, is positive. 


(3.1) Lemma. Every nonzero polynomial f(x) € Ofx] can be written as a 


product 
f(x) = cfo(x), 


where c is a rational number and fo(x) is a primitive polynomial in Z[x]. Moreover, 
this expression for f is unique. The polynomial f has integer coefficients if and only if 
c is an integer. If so, then | c| is the greatest common divisor of the coefficients of f, 
and the sign of c is the sign of the leading coefficient of f. 


The rational number c which appears in this lemma is called the content of 
f(x). If f has integer coefficients, then the content divides f in Z[x]. Also, f is primi- 
tive if and only if its content is 1. 


Proof of the Lemma. To find fo, we first multiply f by an integer to clear the 
denominators in its coefficients. This will give us a polynomial f; with integer 
coefficients. Then we factor out the greatest common divisor of the coefficients of /f, 
and adjust the sign of the leading coefficient. The resulting polynomial fo is primi- 
tive, and f = cfo for some rational number c. This proves existence. 

To prove uniqueness, suppose that cfo(x) = dgo(x), where c,d € @ and fo, go 
are primitive polynomials. We will show that c = d and fo = go. Clearing denomi- 
nators reduces us to the case that c and d are integers. Let {a;}, {b;} denote the 
coefficients of fo, go respectively. Then ca; = db; for all i. Since the greatest com- 
mon divisor of {ao,.... dn} is 1, c is the greatest common divisor of {cao,..., can}. 
Similarly, d is the greatest common divisor of {dbo,...,dbn} = {cdo,..., Can}. Hence 
c = +d and fo = +g». Since fo and go have positive leading coefficients, fo = go 
and c = d. If f has integer coefficients, clearing of the denominator is not necessary; 
hence c is an integer, and up ft» sign it is the greatest common divisor of the 


coefficients, as stated. 5 


As we have already observed, the Substitution Principle gives us a homomor- 
phism 
(3.2) 2[x]—— F, [x], 
where F, = Z/pZ is the field with p elements. This homomorphism sends a polyno- 


mialif (x)= sayx”™ ho -+> tpoao to its residue f(x) = Gmx™ + +++ + G modulo p. We 
will now use it to prove Gauss’s Lemma. 
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(3.3) Theorem. Gauss’s Lemma: A product of primitive polynomials in Z[x] is 
primitive. 

Proof. Let the polynomials be f and g, and let h = fg be their product. Since 
the leading coefficients of f and g are positive, the leading coefficient of h is, too. To 
show that h is primitive, we will show that no prime integer p divides all the 
coefficients of h(x). This will show that the content of A is |. Consider the homo- 
morphism Z[x]—— F,[x] defined above. We have to show that h # 0. Since f is 
primitive, its coefficients are not all divisible by p. So f # 0. Similarly, g # 0. 
Since the polynomial ring F,[x] is an integral domain, h = fg # 0, as required. o 


(3.4) Proposition. 


(a) Let f, g be polynomials in Q[x], and let fo, go be the associated primitive poly- 
nomials in Z[x]. If f divides g in Q[x], then fo divides go in Z[x]. 

(b) Let f be a primitive polynomial in Z[x], and let g be any polynomial with 
integer coefficients. Suppose that f divides g in Q[x], say g = fg, with 
g © Q[x]. Then g € Z[x], and hence f divides g in Z[x]. 

(c) Let f, g be polynomials in Z[{x]. If they have a common nonconstant factor in 
Q[x], then they have a common nonconstant factor in Z[x] too. 


Proof. To prove (a), we may clear denominators so that f and g become primi- 
tive. Then (a) is a consequence of (b). To prove (b), we apply (3.1) in order to write 
the quotient in the form g = cqo, where qo is primitive and c € @. By Gauss’s 
Lemma, fqo is primitive, and the equation g = cfgo shows that it is the primitive 
polynomial go associated to g. Therefore g = cgy is the expression for g referred to 
in Lemma (3.1), and c is the content of g. Since g € Z[x], it follows that c € Z, 
hence that g € Z[x]. Finally, to prove (c), suppose that f, g have a common factor h 
in Q[x]. We may assume that h is primitive, and then by (b) A divides both f and g 
in Z[x]. 0 


(3.5) Corollary. If a nonconstant polynomial f is irreducible in Z[x], then it is ir- 
reducible in Q[x]. 5 


(3.6) Proposition. Let f be an integer polynomial with positive leading coef- 
ficient. Then f is irreducible in Z[x] if and only if either 


(i) fis a prime integer, or 
(ii) f is a primitive polynomial which is irreducible in Q[x]. 


Proof. Suppose that f is irreducible. As in Lemma (3.1), we may write 
f = cfo, where fy is primitive. Since f is irreducible, this can not be a proper factor- 
ization. So either c or fy is I. If fo = 1, then fis constant, and to be irreducible, a 
constant polynomial must be a prime integer. If c = 1, then fis primitive, and is ir- 
reducible in Q[x] by the previous corollary. The converse, that integer primes and 
primitive irreducible polynomials are irreducible elements of Z[x], is clear. o 
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(3.7) Proposition. Every irreducible element of Z[x] is a prime element. 
Proof. Let f be irreducible, and suppose f divides gh, where g,h © Z[x]. 


Case 1: f = pisaprime integer. Write g = cgo and h = dho as in (3.1). Then Boho 

is primitive, and hence some coefficient a of goo is not divisible by p. But since p 

divides gh, the corresponding coefficient, which is cda, is divisible by p. Hence p 

divides c or d, so p divides g or h. 

Case 2: f is a primitive polynomial which is irreducible in Q[x]. By (2.11b), fis a 

= element of Q[x]. Hence f divides g or h in Q[x]. By (3.4), f divides g or h in 
ea 


(3.8) Theorem. The polynomial ring Z[x] is a unique factorization domain. Every 
nonzero polynomial f(x) € Z[x] which is not +1 can be written as a product 


F(x) = Epis: pm qilx) *+ n(x), 
where the p; are prime integers and the qi(x) are irreducible primitive polynomials. 
This expression is unique up to arrangement of the factors. 


Existence of factorizations is easy to prove for Z[x], so this theorem follows from 
Propositions (3.7) and (2.8). o 


Now let R be any unique factorization domain, and let F be its field of fractions 
[Chapter 10 (6.5)]. Then R[x] is a subring of F[x], and the results of this section 
can be copied, replacing Z by R and @ by F throughout. The only change to be 
made is that instead of normalizing primitive polynomials it is better to allow ambi- 
guity caused by unit factors, as in the previous section. The main results are these: 


(3.9) Theorem. Let R be a unique factorization domain with field of fractions F. 
(a) Let f, g be polynomials in F [x], and let fo, go be the associated primitive poly- 
nomials in R[x]. If f divides g in F[x], then fo divides go in R[x]. 
(b) Let f be a primitive polynomial in R[x], and let g be any polynomial in R[x]. 
Suppose that f divides g in F[x], say g = fg, with € F[x]. Theng € R[x], 
and hence f divides g in R[x]. 
(c) Let f, g be polynomials in R[x]. If they have a common nonconstant factor in 
F(x], then they have a common nonconstant factor in R[x] too. 
(d) If a nonconstant polynomial f is irreducible in R[x], then it is irreducible in 
F[x]. 
(e) R[x] is a unique factorization domain. 
The proof of Theorem (3.9) follows the pattern established for the ring Z[x], and we 
omit it. o 


Since R[x1,...,%n] ~ R[x,...,%n—-1][%n], we obtain this corollary: 


(3.10) Corollary. The polynomial rings Z[m,..., xn] and F[x,...,xn], where F is 
a field, are unique factorization domains. o 
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So the ring C[x, y] of complex polynomials in two variables is a unique factor- 
ization domain. In contrast to the case of one variable, however, where every com- 
plex polynomial is a product of linear ones, complex polynomials in two variables 
are often irreducible, and hence prime. 

The irreducibility of a polynomial f(x, y) can sometimes be proved by studying 
the locus W = {f(x, y) = 0} in C*. Suppose that f factors, say 


f(x,y) = g(x, yA, y), 


where g,/ are nonconstant polynomials. Then f(x, y) = 0 if and only if one of the 
two equations g(x, y) = 0 or A(x, y) = 0 holds. So if we let U = {g(x, y) = O}, 
V = {h(x, y) = 0} denote these two varieties in C’, then 


W=UUV. 


It may be possible to see geometrically that W has no such decomposition. 
For example, we can use this method to show that the polynomial 


Tay x yee 


is irreducible. Since the total degree of fis 2, any proper factor ot f has to be linear, 
of the form g(x, y) = ax + by + c. And the solutions to a linear equation lie on a 
line, whereas {f = 0} is a circle. Of course when we speak of lines and circles, we 
are actually talking about the real loci in R’. So this reasoning shows that f is irre- 
ducible in R[x, y]. But in fact, the real locus of a circle has enough points to show 
irreducibility in C[x, y] too. Suppose that f = gh in C[x,y], where g and h are 
linear as before. Then every point of the real circle x7 + y* — 1 = 0 lies on one of 
the complex loci U, V. So at least one of these loci contains two real points. There is 
exactly one complex line (a line being the locus of solutions of a linear equation 
ax + by + c = 0) which passes through two given points, and if these points are 
real, the linear equation defining the line is also real, up to a constant factor. This is 
proved by writing down the equation of a line through two points explicitly. So if f 
has a linear factor, then it has a real one. But the circle does not contain a line. 

One can also prove that x? + y? — | is irreducible algebraically, using the 
method of undetermined coefficients (see Section 4, exercise 18). 


4, EXPLICIT FACTORIZATION OF POLYNOMIALS 


We now pose the problem of determining the factors of a given integer polynomial 
(4.1) f(x) = anx" + ++ + ajx + ao. 


What we want are the irreducible factors in Q[x], and by (3.5) this amounts to de- 
termining the irreducible factors in Z[x]. Linear factors can be found fairly easily. If 
b,x + bo divides f(x), then b, divides a, and bo divides do. There are finitely many 
integers which divide a, and dao, so we can try all possibilities. In each case, we 
carry out the division and determine whether the remainder is zero. Or we may sub- 
stitute the rational number r = —bo/b, into f(x) to see if it is a root. 
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Though things are not so clear for factors of higher degree, Kronecker showed 
that the factors can be determined with a finite number of computations. His method 
is based on the Lagrange interpolation formula. Unfortunately this method requires 
too many steps to be practical except for factors of low degree, and a lot of work has 
been done on the problem of efficient computation. One of the most useful methods 
is computation modulo p, using the homomorphism Z[x«]—— F,[x]. If our polyno- 
mial f(x) factors in Z[x]: f = gh, then its residue f(x) modulo p also factors: 
ft = gh. And since there are only finitely many polynomials of each degree in F,[x], 
all factorizations there can be carried out in finitely many steps. 


(4.2) Proposition. Let f(x) = aux" + +: + ay © Z[x] be an integer polynomial, 
and let p be a prime integer which does not divide a,. If the residue f of f modulo p 
is irreducible, then f is irreducible in Q[.x]. 


Proof. This follows from an inspection of the homomorphism. We need the 
assumption that p does not divide a, in order to rule out the possibility that a factor g 
ot f could reduce to a constant in F,[x]. This assumption is preserved if we replace f 
by the associated primitive polynomial. So we may assume that fis primitive. Since 
p does not divide a,, the degrees of f and f are equal. If f factors in Q[x], then it 
also factors in Z[.x], by Corollary (3.5). Let f = gh be a proper factorization in 
Z{x]. Since f is primitive, g and h have positive degree. Since deg f = deg f and 
f = 2h, it follows that deg ¢ = deg g and deg h = deg A, hence that f = gh is a 
proper factorization, which shows that f is reducible. o 


Suppose we suspect that a given polynomial f(x) € Z[x] is irreducible. Then 
we can try reduction modulo p for a few low primes, p = 2 or 3 for instance, and 
hope that f turns out to be of the same degree and irreducible. If so, we will have 
proved that f is irreducible too. Note also that since F, is a field, the results of Theo- 
rem (1.5) hold for the ring F,[_x]. 

Unfortunately, there exist integer polynomials which are irreducible, though 
they can be factored modulo p for every prime p. The polynomial x* — 10x? + 1 is 
an example. So the method of reduction modulo p will not always work. But it does 
work quite often. 

The irreducible polynomials in F,[x] can be found by the “sieve” method. The 
sieve of Eratosthenes is the name given to the following method of determining the 
primes less than a given number n. We list the integers from 2 to n. The first one, 2, 
is prime because any proper factor of 2 must be smaller than 2, and there is no 
smaller integer on the list. We make a note of the fact that 2 is prime, and then we 
cross out the multiples of 2 from our list. Except for 2 itself, they are not prime. The 
first integer which is left, 3, is a prime because it isn’t divisible by any smaller 
prime. We note that 3 is a prime and then cross out the multiples of 3 from our list. 
Again, the smallest remaining integer, 5, is a prime, and so on. 


23 K5K7 FX 11 BW 13 4 1 17 19 .... 


This method will also determine the irreducible polynomials in F,[x]. We list 
all polynomials, degree by degree, and then cross out products. For example, the 
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linear polynomials in F,[x] are x and x + 1. They are irreducible. The polynomials 
of degree 2 are x”, x? + x, x? + 1, and x? + x + 1. The first three are divisible by 
x or by x + 1, so the last one is the only irreducible polynomial of degree 2 over F). 


(4.3) The irreducible polynomials of degree = 4 over F2: 
gex bh lex et EX ee tL eee 
xo ore? + 1 x? tb ee 1, See? + xt eee I: 


By trying the polynomials on this list, we can factor all polynomials of degree 9 or. 
less in F.[x]. 

As a sample application of 4.2, the polynomial x* — 6x* + 12x? — 3x + 9 is 
irreducible in Q[x], because its residue in F.[x] is x* + x + 1. 


(4.4) The monic irreducible polynomials of degree 2 over Fs: 
Kel eek Lee — yd, 


Reduction modulo p may help describe the factorization of a polynomial even 
though the residue is reducible. Consider the polynomial f(x) = x’ + 6x + 3 for 
instance. Reducing modulo 3, we obtain x’. This doesn’t look like a promising tool. 
However, suppose that f(x) were reducible, say (ax + b)(cx*? + dx + e) = 
x° + 6x + 3. Then the residue of ax + b would have to divide x? in F3[x], which 
would imply b = 0 (modulo 3). Similarly, we could conclude e = 0 (modulo 3). It 
is impossible to satisfy both of these conditions, because be = 3. Therefore no such 
factorization exists, and f(x) is irreducible. 

The principle at work in this example is called the Eisenstein Criterion. 


(4.5) Proposition. Eisenstein Criterion: Let f(x) = anx" + +++ + ao © Z[x] be 
an integer polynomial, and let p be a prime integer. Suppose that the coefficients of 
f satisfy the following conditions: 


(1) p does not divide a,; 
(ii) p divides the other coefficients an—),..., 0; 
(iii) p? does not divide ao. 


Then f is irreducible in Q[x]. If f is primitive, it is irreducible in Z[x]. 
For example, x* + 50x” + 30x + 20 is irreducible in Q[x] and in Z[x]. 


Proof of the Eisenstein Criterion. Assume that the conditions are met for f. Let 
f denote the residue modulo p. The hypotheses (i) and (ii) imply that if = ax” and 
that a, # 0. If f is reducible in Q[x], then it will factor in Z[x] into factors of posi- 
tive degree, say f = gh. Then @ and h divide a,x", and hence each of these polyno- 
mials is a monomial. Therefore all coefficients of g and of h, except the highest 
ones, are divisible by p. Let the constant coefficients of g, h be bo, co. Then the con- 
stant coefficient of fis ao = boco. Since p divides bo and co, it follows that p? divides 
a, which contradicts (iii). This shows that fis irreducible. The last assertion follows 
from (3.6). o 
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One of the most important applications of the Eisenstein Criterion is to prove 
the irreducibility of the cyclotomic polynomial x?~' + x?~? + ++» + x + 1, whose 
roots are the pth roots of unity, the powers of £ = e?”/p: 


(4.6) Corollary. Let p be a prime. The polynomial f(x) = x?~! + xP? + + + 
x + 1 is irreducible in QLx]. 


Proof. We note that (x-— 1) f(x) = x? — 1. Next, we make the substitution 
x = y + 1 into this product, obtaining 


wosnavernrasrs (ere (prea %p 


We have ol = p(p — 1): (p —i — 1)/i!. If i < p, then the prime p isn’t a fac- 
tor of i!, so i! divides the product (p — 1)---(p — i + 1) of the remaining terms in 
the numerator of the integer : . This implies that ; is divisible by p. Dividing 


the expansion of vf(y + 1) by y shows that f(y + 1) satisfies the conditions of the 
Eisenstein Criterion, hence that it is an irreducible polynomial. It follows that f(x) is 
irreducible too. o 


It is instructive to examine the statement analogous to the Eisenstein Criterion 
when the ring of integers is replaced by the polynomial ring C[r]. Then Z[ x] gets re- 
placed by C[r][x] ~ C[r, x], the polynomial ring in two variables. 


(4.7) Proposition. Let f(r, x) be an element of C[t, x], written as a polynomial in 
x whose coefficients are polynomials in t: f(t,x) = an(t)x” + +++ + a,(t)x + ad(t). 
Suppose that 


(i) t does not divide a,(t), 
(ii) t divides a,—,(t),..., do(t), 
(iii) t? does not divide ao(t). 


Then f(r, x) is irreducible in the ring C(t)[x]. If f is primitive, meaning that it has no 
factor which is a polynomial in alone, then f is irreducible in C[t, x]. 


This can be proved exactly as we proved (4.5), replacing F,[x] by 
C[x] = C[r, x]/(t). But let us examine the geometry of this situation by considering 
the locus W = {f(t, x) = 0} in complex 2-space. Conditions (i) and (ii) of (4.7) im- 
ply that f(0, x) = cx”, where c = a,(0) # 0. Consequently the only solution of 
f(t,x) = O with t = Oist = x = 0, so the variety W meets the x-axis {t = 0} only 
at the origin. 

Suppose that f(t, x) is reducible: f(t,x) = g(t, x)h(t,x). Then W is the union 
of the two varieties U = {g = 0} and V = {h = O}. Also, cx” = f(0,x) = 
2(0, x)h(0, x). Hence g(0, x) is a constant times x’, and A(0, x) is a constant times 
x"~", where r is the degree of g in the variable x. Therefore g and h both vanish at 
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the origin. It follows that the origin is a singular point of W, meaning that the partial 
derivatives 0f/dx and 0f/dt both vanish at (0,0). This is checked by differentiating 
the product gh. On the other hand, af/dt (0,0) = dao/dt (0), and this is the linear 
coefficient of ao(t). If it vanishes, t* divides ao(t), contrary to (4.7iii). o 


5. PRIMES IN THE RING OF GAUSS INTEGERS 


We have seen that the ring of Gauss integers is a Euclidean domain. Its units are 
{+1, +i}, and every element which is not zero and not a unit is a product of prime 
elements. In this section we will study these prime elements, called Gauss primes, 
and their relation to prime integers. We looked at some examples in Section 2, 
where we saw that the prime integer 5 factors in Z[i]: 5 = (2 + i)(2 — i), while 3 
does not factor; 3 is a Gauss prime. Remember that since there are four units, there 
are four associate factorizations of the integer 5 which we consider equivalent: 


(Disiet) (2a). ( ~2 et) — 2d] — 2 eee) eee 


We will now show that the examples 3 and 5 exhibit the two ways that prime in- 
tegers can factor in the ring Z[i]. The story is summed up in this theorem: 


(5.1) Theorem. 


(a) Let p be a prime integer. Then either p is a Gauss prime, or else it is the 
product of two complex conjugate Gauss primes: p = 777. 

(b) Let a be a Gauss prime. Then either 777 is a prime integer, or else it is the 
square of a prime integer. 


(c) The prime integers which are Gauss primes are those ase to 3 modulo 4; 
that is..p = 35/114 19.. 


(d) Let p be a prime integer. The following are equivalent: 


(i) p is a product of two complex conjugate Gauss primes. 

(ii) p is the sum of po integer squares: p = a? + b’, witha,b € Z. 
(iii) The congruence x* = —1 (modulo p) has an integer solution. 
(iv) p = | (modulo 4), or p = 2; that is, p = 2,5, 13, 17,. 


It will take some time to prove all parts of this theorem. 
The following lemma follows directly from the definition of a Gauss integer: 
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(5.2) Lemma. A Gauss integer which is a real number is an ordinary integer. An 
ordinary integer d divides another integer a in Z[i] if and only if d divides a in Z. 
Moreover, d divides a Gauss integer a + bi if and only if d divides both a and b. 


Now to prove part (a) of the theorem, let p be an integer prime. Then p is not 
a unit in the ring Z[i]. Hence it has a Gauss prime divisor, say 7 = a + bi, where 
a,b © Z. The complex conjugate 7 = a — bi also divides p because p = p, so 
m7 = a’ + b* divides p* in the ring of Gauss integers. Being an integer, 77 is an 
integer divisor of p*. There are two possibilities: 7 may be an associate of p. In this 
case, p is a Gauss prime. Otherwise 77 is a proper divisor of p in the ring of Gauss 
integers, and then 777 is a proper divisor of p’ in the ring Z. Since 777 is a positive 
integer, 777 = p in this case. 

We can turn this argument around to prove (b). Let 7 be a Gauss prime. Then 
WT iS a positive integer, say 77 = n. We factor n into primes in the ring of in- 
tegers. This factorization will. also be a factorization in the Gauss integers, though 
not necessarily a prime factorization. Since 7 is a Gauss prime which divides n in 
Z{i], it divides one of the integer prime factors of n. Thus 7r divides an integer prime 
p. Then 77 is an integer divisor of p*, hence 77 = por p’. 

Note that part (c) of Theorem (5.1) is a formal consequence of (a) and of the 
equivalence of conditions (d)(i) and (d)(iv). So we need not consider part (c) further, 
and we now turn to the proof of part (d). It is easy to see that (i) and (i1) of part (d) 
are equivalent: Suppose that p = w7 for some Gauss prime 7 = a + bi. Then 

= wim = (a + bia — bi) = a* + b’, sop is a sum of two integer squares. Con- 
versely, if p = a? + b’, then p = (a + bi)(a — bi) provides a factorization of p in 
the ring of Gauss integers, which is a prime factorization because of (a). 

The equivalence of (d)(i) and (d)(iil) of Theorem (5.1) is harder to prove. To 
do so, we go back to the formal construction of the Gauss integers. The ring Z[/] is 
obtained from the ring Z by adjoining an element i with the relation i* + 1 = 0. So 
there is an isomorphism 


(5.3) Plaine cbai)— > Za. 


Let (p) denote the principal ideal generated by a prime integer p in the ring of Gauss 
integers. Its elements are the Gauss integers a + bi such that a and b are both divis- 
ible by p. Denote by R' the quotient ring Z[i]/(p). Then R’ can also be thought of 
as the ring obtained by introducing the two relations 


(5.4) x?>+1=0 and p=0 
into the polynomial ring Z[x]. So we have an isomorphism 
(5.5) ZV Ge 1, pa py, 


where (x? + 1, p) denotes the ideal of Z[x] generated by the two elements. 


(5.6) Lemma. Let p be a prime integer. The following statements are equivalent: 


(i) p is a Gauss prime; 
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(ii) the ring R'’ = Z[i]/(p) is a field; 
(iii) x? + 1 is an irreducible polynomial in the ring F,[x]. 


Proof. The equivalence of the first two statements follows from Proposition 
(2.14). What we are really after is the equivalence of (i) and (iii), and at first glance, 
these two statements do not seem to be related at all. It was in order to obtain this 
equivalence that we introduced the auxiliary ring R’. The proof is based on the fol- 
lowing elementary but remarkably useful observation, which follows from the Third 
Isomorphism Theorem [Chapter 10 (4.3b)]: 


(5.7) | To construct the ring R’, it does not matter which of the two relations 
(5.4) is introduced into the ring Z{x] first. 


So let us reverse the order and begin by killing the element p. The Substitution Prin- 
ciple tells us what we will get. The kernel of the homomorphism Z[x]—— F,[x] is 
precisely the ideal p Z[x]. Since this map is surjective, it induces an isomorphism 


2[x]/pZ[x]—> [x]. 


We now introduce our other relation x? + 1 = 0 into this ring, interpreting the 
coefficients of this polynomial as elements of Fp. The result is an isomorphism 


(5.8) | Fo[x)/(x2 + JOR’. 


Proposition (2.14), applied to the ring F,[x], shows that R’ is a field if and only if 
x? + 1 is irreducible in F,[x]. o 


We can now prove the equivalence of conditions (d)(i) and (d)(iii) of (5.1). We 
know by Lemma (5.6) that p is a Gauss prime if and only if x? + 1 is an irreducible 
polynomial in the ring F,[x]. Since it is a quadratic polynomial, x? + 1 is reducible 
if it has a root in F, and irreducible if it has no root. Also, the residue of an integer 
a (modulo p) is a root of x* + 1 if and only if a? = —-1 (modulo p). Thus the con- 
gruence x” = —1 (modulo p) has a solution if and only if x? + 1 is reducible modulo 
p, which happens if and only if p is not a Gauss prime. The equivalence of (i) and 
(iii) follows. 

It remains to prove the equivalence of condition (iv) of part (d) with one of the 
other conditions. We will show its equivalence with condition (iii). The congruence 
x? = —1 (modulo 2) does have the solution x = 1, so it is sufficient to look at the 
other primes, that is, at the odd primes. The following lemma does the job: 


(5.9) Lemma. Let p be an odd prime, and let @ denote the residue of an integer a 
modulo p. 


(a) The integer a solves the congruence x* = —1 (modulo p) if and only if its 
residue @ is an element of order 4 in the multiplicative group of the field F,. 


(b) The multiplicative group Fp“ contains an element of order 4 if and only if 
p = 1 (modulo 4). 
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Proof. There is exactly one element of order 2 in F,*, namely the residue of 
—1. This is because an element of order 2 is a root of the polynomial x* — 1, and 
we know the roots of this polynomial: They are +1 in any field [see (1.7)]. If a 
residue @ has order 4 in F,*, then @’ has order 2; hence @? = —1, which means 
a* = ~1| (modulo p). Conversely, if a = —1 (modulo p), then @ has order 4 in F,*. 
This proves part (a) of the lemma. 

Now the order of the group F,* is p — 1. So if this group contains an element 
of order 4, then p — | is divisible by 4, or equivalently p = 1 (modulo 4). Con- 
versely, suppose that p — | is divisible by 4, and let H be the Sylow-2 subgroup of 
F,*, whose order is the largest power 2’ of 2 which divides p — 1. Since 4 divides 
p — 1, the order of H is at least 4, so there is an element @ in H different from +1. 
This element does not have order 2, nor does it have order 1. But since H is a 2- 
group, the order of @ is a power of 2. So some power of @ has order exactly 4. 


This completes the proof of Theorem (5.1). c 


6. ALGEBRAIC INTEGERS 


In the next sections we are going to study factorization of algebraic numbers in a 
simple but important case, that of quadratic imaginary integers. The ring of Gauss 
integers is our model here. It was in order to extend the properties of factorization of 
ordinary integers to algebraic numbers that ideals were first introduced, and the ex- 
tension is very beautiful. 

In contrast to most of the topics we have studied, the arithmetic of quadratic 
number fields is not of universal importance. It has many applications to arithmetic, 
but not so many in other areas of mathematics. Our reason for including this topic, 
aside from its elegance, is its historical importance. Many of our algebraic tools 
were first developed in order to extend arithmetic properties of the integers to alge- 
braic numbers. 

A typical application of algebraic numbers to arithmetic is to the problem of 
determining integer points on an ellipse such as 


(6.1) aE Sy* =o, 


where for simplicity we assume that p is a prime. To determine integer points on the 
circle x? + y? = p, we may begin by factoring the left side, obtaining 
(x + iy)(x — iy) = p, and then use arithmetic in the Gauss integers to analyze the 
factorization. We did this in our proof of Theorem (5.1). The analogous procedure 
for equation (6.1) leads to 


(x + V-5y)(x — V-Sy) = p, 


so we may attempt an analysis in the ring Z[V -5]. However, as we have seen, fac- 
torization is not unique in this ring. We will have some trouble. 
Another example is the famous Fermat Equation 


(6.2) Yoey = 
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It was proved by Euler that this equation has no integer solutions, except for the triv- 
ial solutions in which one of the variables is zero. To analyze it, we may bring y° to 
the other side and factor, obtaining 


(6.3) x? = (z — yz — gz - &), 
where 
(6.4) ¢ =4(-1 + V-3) =e"! 


is a complex cube root of 1. One can then analyze this equation using arithmetic in 
the ring Z[£]. This ring happens to be a Euclidean domain, so unique factorization 1s 
available. Unfortunately, the proof that (6.2) has no nontrivial solution is fairly com- 
plicated, so we will not give it. 

Problems of this type, which ask for integer solutions of polynomial equations, 
are called Diophantine problems. We will analyze a few of them in Section 12, when 
the necessary tools have been assembled. 

A complex number a is called algebraic if it is the root of a nonzero polyno- 
mial f(x) with rational coefficients (Chapter 10, Section 1). We can, of course, clear 
denominators in the coefficients of the polynomial f(x). So if @ is an algebraic num- 
ber, then it is also the root of a polynomial with integer coefficients. The number a 
is called an algebraic integer if it is the root of a monic polynomial with integer 
coefficients, a polynomial of the form 


(6.5) TOS" F Gik F aon Withee 


Thus the cube root of unity ¢, being a root of the polynomial x’ — 1, is an algebraic 
integer. 

Let @ be an algebraic number. The set of all polynomials in Q[x] which have a 
as a root is the kernel of the substitution homomorphism 


Q[x]—— C, defined by f(x) ~~» f(a). 


So it is a principal ideal, generated by an irreducible element f(x) of the polynomial 
ring which is called the irreducible polynomial for a over Q. (Why is f irreducible?) 
It is the polynomial of lowest degree having a@ as a root and is unique up to a con- 
stant factor. The degree of the irreducible polynomial for a is also called the degree 
of a over Q. 

We may choose this irreducible polynomial f(x) for a to be a primitive polyno- 
mial in Z[x]. Then f(x) also generates the ideal of Z[x] of all integer polynomials 
having @ as a root. 


(6.6) Proposition. The kernel of the map Z[x]——C sending x»~~ a is the 
principal ideal of Z[x] generated by the primitive irreducible polynomial for a. 


Proof. Let f(x) be the primitive irreducible polynomial for a. If g € Z[x] has 
@ as a root, then f divides g in Q[x], and hence f divides g in Z[x] too, by (3.4). So 
g is in the principal ideal of Z[x] generated by f. o 
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Note that the leading coefficient of a polynomial f(x) divides the leading 
coefficient of any multiple in Z[x]. So it follows from Proposition (6.6) that if the 
primitive irreducible polynomial f(x) for a is not monic, then a@ is not the root of 
any monic integer polynomial. 


(6.7) Proposition. An algebraic number a is an algebraic integer if and only if the 
primitive irreducible polynomial for @ is monic. Equivalently, a is an algebraic in- 
teger if and only if the monic irreducible polynomial for a@ in Q[x] has integer 
coefficients. o 


The primitive irreducible polynomial for the cube root of unity {is x? + x + 1. 


(6.8) Corollary. A rational number r is an algebraic integer if and only if it is an 
ordinary integer. 


For, the monic irreducible polynomial over @ of a rational number r is x — r.o 


Proposition (6.7) can be used to decide whether or not an algebraic number is 
an algebraic integer, provided that we can compute its irreducible polynomial. For 
example, a = 4(1 + V2) is a root of 4x? — 4x — 1. This is the primitive irre- 
ducible polynomial for a. Hence a is not an algebraic integer. 

The concept of algebraic integer was one of the most important discoveries of 
number theory. It is not easy to explain quickly why it is the right definition to use, 
but roughly speaking, we can think of the leading coefficient of the primitive irre- 
ducible polynomials f(x) for @ as a “denominator.” If @ is the root of an integer 
polynomial f(x) = dx” + a,—\x""' + +++ + do, then da is an algebraic integer, be- 
cause it 1s a root of the monic integer polynomial 


(6.9) Re eee tidy x > tees od" *aix + da” Yao. 


Thus we can “clear the denominator” in any algebraic number a by multiplying it 
with a suitable integer to get an algebraic integer. The leading coefficient is, how- 
ever, not a precise denominator. Thus if a = 3(1 + V2), then 2a is an algebraic 
integer, while the leading coefficient of its primitive irreducible polynomial is 4. 

In another direction, the example of the algebraic integer £ = 3(-1 + 3) 
shows that we must not jump to conclusions just because some expression for an al- 
gebraic number has denominators. 

Explicit computation with algebraic integers is not very easy. It is a fact that 
they form a subring of C, that is, that sums and products of algebraic integers are 
algebraic integers, but this isn’t obvious. Rather than develop a general theory, we 
will work out the case of quadratic extensions explicitly. 

A quadratic number field F = Q[ Vd] consists of all complex numbers 


(6.10) at+bVd, with a,b €Q, 


where d is a fixed integer, positive or negative, which is not a rational square. The 
notation Vd will stand for the positive square root if d > 0 and for the positive 
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imaginary square root if d < 0. If d has a square integer factor, we can pull it out of 
the radical and put it into b without changing the field. Therefore it is customary to 
assume that d is square free, meaning that d = +p,-:: p, where the p; are distinct 
primes, or that d = —1. So the values we take are 


B= — Nott Dash Bete ct Ge) SEE Os, 


The field F is called a real quadratic number field if d > 0, or an imaginary 
quadratic number field if d < 0. 

We will now compute the algebraic integers in F. The computation for a spe- 
cial value of d is no simpler than the general case. Nevertheless, you may wish to 
substitute a value such as d = 5 when going over this computation. We set 


(6.11) 8 = Vd. 
When d is negative, 5 is purely imaginary. Let 
a=at bd 


be any element of F which is not in Q, that is, such that b # 0. Thena’ = a — b6 
is also in F. If d is negative, a’ is the complex conjugate of a. Note that a is a root 
of the polynomial 


(6.12) (x — a)(x — a’) = x? — (ata’)x + aa’ = x? — 2ax + (a*—b?d). 


This polynomial has the rational coefficients -2a and a* — b*d. Since a is not a 
rational number, it is not the root of a linear polynomial. So (6.12) is irreducible and 
is therefore the monic irreducible polynomial for a over @. According to (6.7), @ is 
an algebraic integer if and only if (6.12) has integer coefficients. Thus we have the 
following corollary: 


(6.13) Corollary. a = a+ bé is an algebraic integer if and only if 2a and 
a’ — b’d are integers. o 


This corollary also holds when b = 0, because if a’ is an integer, then so is a. If we 
like, we can use the conditions of the corollary as a definition of the integers in F. 

The possibilities for a and b depend on the congruence class of d modulo 4. 
Note that since d is assumed to be square free, the case d = 0 (modulo 4) has been 
ruled out, sod = 1,2, or 3 (modulo 4). 


(6.14) Proposition. The algebraic integers in the quadratic field F = Q[Vd] have 
the form a = a + bé, where: 


(a) If d = 2 or 3 (modulo 4), then a and b are integers. 
(b) If d = 1 (modulo 4), then eithera,b € Zora,bEZ+4. 


The cube root of unity £ = 3(-1 + V—3) is an example of an algebraic integer of 
the second type. On the other hand, since -1 = 3 (modulo 4), the integers in the 
field Q[i] are just the Gauss integers. 
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Proof of the Proposition. Since the coefficients of the irreducible polynomial 
(6.12) for @ are 2a and a* — bd, q@ is certainly an algebraic integer if a and b are 
integers. Assume that d = 1 (modulo 4) and that a,b € Z + 4. (We say that they 
are half integers.) Then 2a € Z. To show that a? — b?d € Z, we write a = 4m, 
b = 3n, where m,n are odd integers. Computing modulo 4, we find 


m? — n?d = (+1) — (+1)?-1 = 0 (modulo 4). 


Hence a® — b*d = {(m? — n*d) € Z, as required. 
Conversely, suppose that a is an algebraic integer. Then 2a € Z by Corollary 
(6.13). There are two cases: eithera € Zora EZ +}. 


Case 1: a € Z. It follows that b*d € Z too. Now if we write b = m/n, where 
m,n are relatively prime integers and n > 0, then b’?d = m*d/n’. Since d is square 
free, it can’t cancel a square in the denominator. Son = 1. If a is an integer, b must 
be an integer too. 

Case 2: a € Z + +is ahalf integer, say a = $m as before. Then 4a? € Z, and the 
condition a? — b*d € Z implies that 4b*d € Z but b*d €& Z. Therefore b is also a 
half integer, say b = 3n, where n is odd. In order for this pair of values for a, b to 
satisfy a* — b*d € Z, we must have m? — n*d = 0 (modulo 4). Computing mod- 
ulo 4, we find that d = 1 (modulo 4). o 


A convenient way to write all the integers in the case d = 1 (modulo 4) is to 
introduce the algebraic integer 


(6.15) t= stl ta), 
which is a root of the monic integer polynomial 
(6.16) x?-x+i4(1-d). 


(6.17) Proposition. Assume that d = | (modulo 4). Then the algebraic integers 
in F = Q[Vd] are a + bn, where a,b € Z.0 


It is easy to show by explicit calculation that the integers in F form a ring R in 
each case, called the ring of integers in F. Computation in R can be carried out by 
high school algebra. 

The discriminant of F is defined to be the discriminant of the polynomial 
x?,- d in the case R= Z[6] and the discriminant of the polynomial 
x? — x + 4(1 — d) if R = Z[y]. This discriminant will be denoted by D. Thus 


n= {4 if d= 2,3 
 |d ifd=1 


Since D can be computed in terms of d, it isn’t very important to introduce a separate 
notation for it. However, some formulas become independent of the congruence 
class when they are expressed in terms of D rather than d. 

The imaginary quadratic case d < 0 is slightly easier to treat than the real one, 
so we will concentrate on it in the next sections. In the imaginary case, the ring R 


(6.18) (modulo 4). 


414 Factorization Chapter 11 


forms a lattice in the complex plane which is rectangular if d = 2,3 (modulo 4), 
and “isosceles triangular” if d = 1 (modulo 4). When d = —1,R is the ring of 
Gauss integers, and the lattice is square. When d = —3, the lattice is equilateral tri- 
angular. Two other examples are depicted below. 


d=-5 d=-7 


(6.19) Figure. Integers in some imaginary quadratic fields. 


The property of being a lattice is very special to rings such as those we are 
considering here, and we will use geometry to analyze them. Thinking of R as a lat- 
tice is also useful for intuition. 

It will be helpful to carry along a specific example as we go. We will use the 
case d = -5 for this purpose. Since -5 = 3 (modulo 4), the ring of integers forms 
a rectangular lattice, and R = Z[5], where 6 = V-—5. 


7. FACTORIZATION IN IMAGINARY QUADRATIC FIELDS 


Let R be the ring of integers of an imaginary quadratic number field F = Q[6]. If 
a =a + bois inR, so is its complex conjugate @ = a — bd. We call the norm of a 
the integer 


7M) N(a) = aa. 


It is also equal to a? — b?d and to |a@ |’, and it is the constant term of the irreducible 
polynomial for a over Q. Thus N (a) is a positive integer unless a = 0. Note that 


(722) N (By) = N(B)N(y). 
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This formula gives us some control of possible factors of an element a of R. Say that 
a = By. Then both terms on the right side of (7.2) are positive integers. So to 
check for factors of a, it is enough to look at elements 8 whose norm divides N (a); 
this is not too big a job if a and b are reasonably small. 

In particular, let us ask for units of R: 


(7.3) Proposition. 


(a) An element a. of R is a unit if and only if N(@) = 1. 


(b) The units of R are {+1} unless d = -1 or -3. If d = —1, so that R is the ring 
of Gauss integers, the units are {+1, +i}, and if d = —3 they are the powers 
of the 6th root of unity (1 + V‘—3). 


Proof. If a is a unit, then N(a@)N(a™') = N(1) = 1. Since N(q@) and N(a ') 
are positive integers, they are both equal to 1. Conversely, if N(@) = aa = 1, then 
a@=a'. Soa' € R, and a is a unit. Thus a is a unit if and only if it lies on the 
unit circle in the complex. plane. The second assertion follows from the configuration 
of the lattice R [see Figure (6.19)]. o 


Next we investigate factorization of an element a € R into irreducible factors. 


(7.4) Proposition. Existence of factorizations is true in R. 

Proof. If a = By is a proper factorization in R, then B, y aren’t units. So by 
Proposition (7.3), N(a) = N(B)N(y) is a proper factorization in the ring of in- 
tegers. The existence of factorizations in R now follows from the existence of factor- 
izations in Z. o 


However, factorization into irreducible elements will not be unique in most 
cases. We gave a simple example with d = —5 in Section 2: 
(7-3) 6=2-3=(1+ 6)(1 — 9), 


where 6 = V-5. For example, to show that 1 + 6 is irreducible, we note that its 
norm is (1 + 6)(1 — 6) = 6. A proper factor must have norm 2 or 3, that is, abso- 
lute value V2 or V3. There are no such points in the lattice R. 

The same method provides examples for other values of d: 


(7.6) Proposition. The only ring R with d = 3 (modulo 4) which is a unique fac- 
torization domain is the ring of Gauss integers. 


Proof. Assume that d = 3 (modulo 4), but that d # —1. Then 
1 -a=2(454) and 1 — d= (1+ 8)(1 — 8). 


There are two factorizations of 1 — d in R. The element 2 is irreducible because 
N(2) = 4 is the smallest value >1 taken on by N (a). [The only points of R inside 
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the circle of radius 2 about the origin are 0,1,—-1, when d = -5, -13,-17,.... See 
Figure (6.19).] So if there were a common refinement of the above factorizations, 2 
would divide either 1 + 6 or | — 6 in R, which it does not: } + 46 is not in R when 
d = 3 (modulo 4). o 


Notice that this reasoning breaks down if d = | (modulo 4). In that case, 2 
does divide | + 5, because + + $5 € R. In fact, there are more cases of unique fac- 
torization when d = | (modulo 4). The following theorem is very deep, and we will 
not prove it: 


(7.7) Theorem. Let R be the ring of integers in the imaginary quadratic field 
Q(V ad). Then R is a unique factorization domain if and only if d is one of the in- 
begetseo | 239-7, 1 4 


Gauss proved for these values of d that R is a unique factorization domain. We will 
learn how to do this. He also conjectured that there were no others. This much more 
difficult part of the theorem was finally proved by Baker and Stark in 1966, after the 
problem had been worked on for more than 150 years. 

Ideals were introduced to rescue the uniqueness of factorization. As we know 
(2.12), R must contain some nonprincipal ideals unless it is a unique factorization 
domain. We will see in the next section how these nonprincipal ideals serve as sub- 
stitutes for elements. 

Note that every nonzero ideal A is a sublattice of R: It is a subgroup under ad- 
dition, and it ts discrete because R is discrete. Moreover, if a is a nonzero element 
of A, then aé is in A too, and a, a6 are linearly independent over R. However, not 
every sublattice ix an ideal. 


(7.8) Proposition. If d = 2 or 3 (modulo 4), the nonzero ideals of R are the sub- 
lattices which are closed under multiplication by 6. If d = 1 (modulo 4), they are 
the sublattices which are closed under multiplication by n = 3(1 + 6). 


Proof. To be an ideal, a subset A must be closed under addition and under 
multiplication by elements of R. Any lattice is closed under addition and under mul- 
tiplication by integers. So if it is also closed under multiplication by 6, then it is also 
closed under multiplication by an element of the form a + b6, with a,b © Z. This 
includes all elements of R if d = 2,3 (modulo 4). The proof in the case that d = 1 
(modulo 4) is similar. 5 


In order to get a feeling for the possibilities, we will describe the ideals of the 


ring R = Z[V —-5] before going on. The most interesting ideals are those which are 
not principal. 


(7.9) Theorem. Let R = Z[6], where 6 = V—5, and let A be a nonzero ideal of 
R. Let a be a nonzero element of A of minimal absolute value |a|. There are two 
cases: 
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Case 1: A is the principal ideal (a), which has the lattice basis (a, a). 
Case 2: A has the lattice basis (a,}(a@ + a@6)), and is not a principal ideal. 


The second case can occur only if }(a@ + a6) is an element of R. The ideal 
A = (2,1 + 8), which is depicted below, is an example. 


(7.10) Figure. The ideal (2,1 + 6) in the ring Z[5], 6 = V—S. 


The statement of Proposition (7.9) has a geometric interpretation. Notice that 
the lattice basis (a,a@5) of the principal ideal (a) is obtained from the lattice basis 
(1,5) of R by multiplication by a. If we write a = re ‘9 then the effect of multipli- 
cation by a is to rotate the complex plane through the angle 6 and then stretch by the 
factor r. So (a) and R are similar geometric figures, as we noted in Section 2. Simi- 
larly, the basis (a,(a@ + @&)) is obtained by multiplication by }a from the basis 
(2,1 + 6). So the ideals listed in Case 2 are geometric figures similar to the one de- 
picted in Figure (7.10). The similarity classes of ideals are called the ideal classes, 
and their number is called the class number of R. Thus Proposition (7.9) implies that 
the class number of Z[V -5] is 2. We will discuss ideal classes for other quadratic 
imaginary fields in Section 10. 

The proof of Proposition (7.9) is based on the following lemma about lattices 
in the complex plane: 


(7.11) Lemma. Let r be the minimum absolute value among nonzero elements of 
a lattice A, and let y be an element of A. Let D be the disc of radius +r about the 
point !y. There is no point of A in the interior of D other than its center ;y. 


The point }y may lie in A or not. This depends on A and on y. 


Proof. Let B be a point in the interior of D. Then by definition of the disc, 
|B — ty| <tr, or equivalently, |nB — y| <r. If B € A, then nB — y € A too. 
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In this case, nB — y is an element of A of absolute value less than r, which implies 
that nB — y = 0, hence that B = fy. o 


Proof of Theorem (7.9). Let a be the chosen element of A of minimal absolute 
value r. The principal ideal (a) = Ra consists of the complex numbers (a + béd)a, 
with a,b € Z. So it has the lattice basis (a,@6) as is asserted in the proposition. 
Since A contains a, it contains the principal ideal (a) too, and if A = (@) we are in 
Case 1. 

Suppose that A > (a), and let B be an element of A which is not in (a). We 
may choose f to lie in the rectangle whose four vertices are 0,a,a65,a + a6 [see 
Chapter 5 (4.14)]. Figure (7.13) shows a disc of radius r about the four vertices of 
this rectangle, and a disc of radius 3r about the three half lattice points 
5a5,4(a@ + ad), and a + 4a6. Notice that the interiors of these discs cover the 
rectangle. According to Lemma (7.11), the only points of the interiors which can lie 
in A are the centers of the discs. Since B is not in (@), it is not a vertex of the 
rectangle. So B must be one of the half lattice points }a5,}(@ + a6), ora + 3a6. 


(7.13) Figure. 


This exhausts the information which we can get from the fact that A is a lattice. 
We now use the fact that A is an ideal to rule out the two points a6 and a + 4a86. 
Suppose that }@5 € A. Multiplying by 5, we find that a5? = a € A too and 
since a € A that 3a € A. This contradicts our choice of a. Next, we note that if 
a + 4a6 were in A, then $@6 would be in A too, which has been ruled out. The re- 
maining possibilty is that B = }(a@ + 5). If so, we are in Case 2. o 
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8 IDEAL FACTORIZATION 


Let R be the ring of integers in an imaginary quadratic field. In order to avoid confu- 
sion, we will denote ordinary integers by latin letters a,b,..., elements of R by 
greék ‘letters"ay'B ..... and ideals by capital letters A, B,.... We will consider only 
nonzero ideals of R. 

The notation A = (a,B,...,y) stands for the ideal generated by the elements 
a Oe y. Since an ideal is a plane lattice, it has a lattice basis consisting of two 
elements. Any Jattice basis generates the ideal, but we must distinguish between the 
notions of a lattice basis and a generating set. We also need to remember the dic- 
tionary (2.2) which relates elements to the principal ideals they generate. 

Dedekind extended the notion of divisibility to ideals using the following 
definition of ideal multiplication: Let A and B be ideals in a ring R. We would like to 
define the product ideal AB to be the set of all products aB, where a € A and 
B € B. Unfortunately, this set of products is usually not an ideal: It will not be 
closed under sums. To get an ideal, we must put into AB all finite sums of products 


(8.1) S a; Bi, where a; € A and B; € B. 
The set of such sums is the smallest ideal of R which contains all products af, and 
we denote this product ideal by AB. (This use of the product notation is different 
from its use in group theory [Chapter 2 (8.5)].) The definition of multiplication of 
ideals is not as simple as we might hope, but it works reasonably well. 

Notice that multiplication of ideals is commutative and associative, and that R 
is a unit element. This is why R = (1) is often called the unit ideal: 


(8.2) AR = RA= A, AB = BA, A(BC) = (AB)C. 


(8.3) Proposition. 
(a) The product of principal ideals is principal: If A = (@) and B = (B), then 
AB = (aB). 
(b) Assume that A = (q@) is principal, but let B be arbitrary. Then 


AB = aB = {aB |B € B}. 


(c) Let a),...,@m and B,,..., Bn be generators for the ideals A and B respectively. 
Then AB is generated as an ideal by the mn products ai fj. 


We leave this proof as an exercise. o 


In analogy with divisibility of elements of a ring, we say that an ideal A 
divides another ideal B if there is an ideal C such that B = AC. 

To see how multiplication of ideals can be used, let us go back to the example 
d = —5, in which 2 - 3 = (1 + 6)(1 — 6). For uniqueness of factorization to hold 
in the ring R = Z[6], there would have to be an element p € R dividing both 2 and 
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1 + &. This is the same as saying that 2 and 1 + 6 should be in the principal ideal 
(p). There is no such element. However, there is an ideal, not a principal ideal, 
which contains 2 and 1 + 6, namely the ideal generated by these two elements. This 
ideal A = (2,1 + 5) is depicted in Figure (7.10). We can make three other ideals 
using the factors of 6: 


A =(2,1-8), B=(3,1+.86), B= (@G,1-— 8). 


The first of these ideals is denoted by A because it is the complex conjugate of 
the ideal A: 


(8.4) A ={@|a € A}. 


As a lattice, A is obtained by reflecting the lattice A about the real axis. That the 
complex conjugate of any ideal is an ideal is easily seen. Actually, it happens that 
our ideal A is equal to its complex conjugate A, because 1 — 6 = 2 — (1 + 6) EA. 
This is an accidental symmetry of the lattice A: The ideals B and B are not the same. 

Now let us compute the products of these ideals. According to Proposition 
(8.3c), the ideal AA-is generated by the four products of the generators (2,1 + 6) 
and (2,1 — 6) of A and A: 


AA = (4,2 + 26,2 — 26,6). 
Each of these four generators is divisible by 2, so AA C (2). On the other hand, 
2 = 6 — 4 is in AA. Therefore (2) C AA, so 

AA = (2)! 
[The notation (2) is ambiguous, because it can denote both 2R and 2Z. It stands for 
2R here.] Next, the product AB is generated by the four products: 
AB = (6,2 + 26,3 + 36, -4 + 28). 
Each of these four elements is divisible by 1 + 6. Since 1 + 6 is in AB, we find that 
AB = (1 + 86). Similarly, AB = (1 — 6) and BB = (3). 
It follows that the principal ideal (6) is the product of the four ideals: 

(8.5) (6) = (2)(3) = (AA)(BB) = {AB)(AB) = (1 + 8)(1 — 6). 


Isn’t this beautiful? The ideal factorization (6) = AABB has provided a common 
refinement of the two factorizations (2.7). 

The rest of this section is devoted to proving unique factorization of ideals in 
the rings of integers of an imaginary quadratic number field. We will follow the dis- 
cussion of factorization of elements as closely as possible. 


The first thing to do is to find an analogue for ideals of the notion of a prime 
element. 


(8.6) Proposition. Let P be an ideal of a ring R which is not the unit ideal. The 
following conditions are equivalent: 


(i) If a, B are elements of R such that aB € P, thena € P or ip ee. 


Section 8 Ideal Factorization 421 


(ii) If A, B are ideals of R such that AB C P, thenA C PorB CP. 
(iit) The quotient ring R/P is an integral domain. 


An ideal which satisfies one of these conditions is called a prime ideal. 


For example, every maximal ideal is prime, because if M is maximal, then 
R/M is a field, and a field is an integral domain. The zero ideal of a ring R is prime 
if and only if R is an integral domain. 


Proof of the Proposition: The conditions for R = R/P to be an integral do- 
main are that R # 0 and that aB = 0 implies @ = 0 or B = 0. These conditions 
translate back to P # Rand ifaB © Pthena € Por® € P. Thus (i) and (iii) are 
equivalent. The fact that (ii) implies (i) is seen by taking A = (a) and B = (B). The 
only surprising implication is that (i) implies (ii). Assume that (i) holds, and let A, B 
be ideals such that AB C P. If A is not contained in P, there is some element 
a € A which is not in P. If B is an element of B, then aB © AB; hence aB € P. 
By part (i), B © P. Since this is true for all of its elements, B C P as required. o 


We now go back to imaginary quadratic number fields. 


(8.7) Lemma. LetA C B be lattices in R*. There are only finitely many lattices L 
between A and B, that is, such that A C L C B. 


Proof. Let (a, a2) be a lattice basis for A, and let P be the parallelogram with 
vertices 0,a@,,@2,@, + @. There are finitely many elements of B contained in P 
[Chapter 5 (4.12)], so if L is a lattice between A and B, there are finitely many pos- 
sibilities for the set L M P. Call this set S. The proof will be completed by showing 
that S and A determine the lattice L. To show this, let y be an element of L. Then 
there is an element of a € A such that y — a is in P, hence in S. [See the proof of 
(4.14) in Chapter 5]. Symbolically, we have L = S + A. This describes L in terms 
of S and A, as required. 5 


(8.8) Proposition. Let R be the ring of integers in an imaginary quadratic number 
field. 


(a) Let B be a nonzero ideal of R. There are finitely many ideals between B and R. 
(b) Every proper ideal of R is contained in a maximal ideal. 
(c) The nonzero prime ideals of R are the maximal ideals. 


Proof. 


(a) This follows from lemma (8.7). 

(b) Let B be a proper ideal. Then B is contained in only finitely many ideals. We 
can search through them to find a maximal ideal. 

(c) We have already remarked that maximal ideals are prime. Conversely, let P be a 
nonzero prime ideal. Then P has finite index in R. So R/P is a finite integral do- 
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main, and hence it is a field [Chapter 10 (6.4)]. This shows that P is a maximal 
ideal. o 


(8.9) Theorem. Let R be the ring of integers in an imaginary quadratic field F. 
Every nonzero ideal of R which is not the whole ring is a product of prime ideals. 
This factorization is unique, up to order of the factors. 


This remarkable theorem can be extended to other rings of algebraic integers, 
but it is a very special property of such rings. Most rings do not admit unique factor- 
ization of ideals. Several things may fail, and we want to take particular note of one 
of them. We know that a principal ideal (a) contains another principal ideal (8) if 
and only if a divides B in the ring. So the definition of a prime element m can be 
restated as follows: If (77) D (@B), then (7) D (a) or (77) D (B). The second of the 
equivalent definitions (8.6) of a prime ideal is the analogous statement for ideals: If 
P > AB, then P D Aor P DB. So if inclusion of ideals were equivalent with di- 
visibility, the proof of uniqueness of factorizations would carry over to ideals. Un- 
fortunately the cumbersome definition of product ideal causes trouble. In most rings, 
the inclusion A D B does not imply that A divides B. This weakens the analogy be- 
tween prime ideal and prime element. It will be important to establish the equiva- 
lence of inclusion and divisibility in the particular rings we are studying. This is 
done below, in Proposition (8.11). 

We now proceed with the proof of Theorem (8.9). For the rest of this section, 
R will denote the ring of integers in an imaginary quadratic number field. The proof 
is based on the following lemma: 


(8.10) Main Lemma. Let R be the ring of integers in an imaginary quadratic 
number field. The product of a nonzero ideal and its conjugate is a principal ideal of 
R generated by an ordinary integer: 


AA = (n), for somen € Z. 


The most important point here is that for every ideal A there is some ideal B such 
that AB is principal. That A does the job and that the product ideal is generated by an 
ordinary integer are less important points. 

We will prove the lemma at the end of the section. Let us assume it for now 
and derive some consequences for multiplication of ideals. Because these conse- 
quences depend on the Main Lemma, they are not true for general rings. 


(8.11) Proposition. Let R be the ring of integers in an imaginary quadratic num- 
ber field. 


(a) Cancellation Law: Let A,B,C be nonzero ideals of R. If AB D AC then 
B DC. If AB = AC, then B = C. 


(b) If A and B are nonzero ideals of R, then A D B if and only if A divides B, that 
is, if and only if B = AC for some ideal C. 
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(c) Let P be a nonzero prime ideal of R. If P divides a product AB of ideals, then 
P divides one of the factors A or B. 


Proof. (a) Assume that AB D AC. If A = (a) is principal, then AB = a@B and 
AC = aC (8.3). Viewing these sets as subsets of the complex numbers, we multiply 
the relation a@B D aC on the left by a | to conclude that B D C. So the assertion is 
true when A is principal. In general, if AB D AC, then multiply both sides by A and 
apply the Main Lemma: nB = AAB D AAC = nC, and apply what has been shown. 
The case that AB = AC is the same. 


(b) The implication which is not clear is that if A contains B then A divides B. We 
will first check this when A = (aq) is principal. In this case, to say that (a) D B 
means that a@ divides every element B of B. Let C = a™'B be the set of quotients, 
that is, the set of elements a~'B, with B = ay € B. You can check that C is an 
ideal and that aC = B. Hence B = AC in this case. Now let A be arbitrary, and as- 
sume that A D B. Then (n) = AA D AB. By what has already been shown, there is 
an ideal C such that nC = AB, or AAC = AB. By the Cancellation Law, AC = B. 


(c) To prove part (c) of the proposition, we apply part (b) to translate divisibility into 
inclusion. Then (c) follows from the definition of prime ideal. o 


Proof of Theorem (S49). There are two things to prove. First we must show that ev- 
ery proper, nonzero ideal A is a product of prime ideals. If A is not itself prime, then 
it is nOt maximal. so we can find a proper tdeal A, strictly larger than A. Then A, 
divides A (8.11b), so we can write A = A,B,. It follows that A C B,. Moreover, if 
we had A = B,, the Cancellation Law would imply R = Aj, contradicting the fact 
that A; is a proper ideal. Thus A < B,. Similarly, A < Aj. Since there are only 
finitely many ideals between A and R, this process of factoring an ideal terminates. 
When it does, all factors will be maximal, and hence prime. So every proper tdeal A 
can be factored into primes. 

Now to prove uniqueness, we apply the property (8.11c) of prime ideals: If 
P, ++» P, = Q:-::Q,. with P;,Q; prime, then P, divides Q,---Q;, and hence it di- 
vides one of the factors, say Q;. Since Q; is maximal, P; = Q,. Cancel by (8.1 1a) 
and use induction on r. o 


(8.12) Theorem. The ring of integers R is a unique factorization domain if and 
only if it is a principal ideal domain. [f so. then the factorizations of elements and of 


ideals correspond naturally. 


Proof. We already know that a principal ideal domain has unique factorization 
(2.12). Conversely, suppose that R is a unique factorization domain, and let P be any 
nonzero prime ideal of R. Then P contains an irreducible element, say 7. For, any 
nonzero element a of P is a product of irreducible elements. and, by definition of 
prime ideal, P contains one of its irreducible factors. By (2.8). an irreducible ele- 
ment 7 is prime, that is, (77) is a prime ideal. By (8.6), (77) is maximal. Since 
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(ar) C P, it follows that (7r) = P, hence that P is principal. By Theorem (8.9), ev- 
ery nonzero ideal A is a product of primes; hence it is principal (8.3a). Thus R is a 
principal ideal domain. The last assertion of the theorem is clear from (2.2). o 


Proof of the Main Lemma (8.10). We can generate A as a lattice by two elements, 
say a,B. Then A is certainly generated as an ideal by these same elements, and 
moreover @, 8 generate A. Hence the four products aq, aB, aB, BB generate the 
ideal AA. Consider the three elements a@, BB, and aB + @B of AA. They are all 
equal to their conjugates and hence are rational numbers. Since they are algebraic 
integers, they are ordinary integers. Let n be their greatest common divisor in Z. 
Then n is a linear combination of aa, BB, aB + @B with integer coefficients. Hence 
n is in the product ideal AA. Therefore AA > (n). If we show that n divides each of 
the four generators of the ideal AA in R, then it will follow that (n) D AA, hence 
that (n) = AA, as was to be shown. 

Now by construction, n divides a@ and BP in Z, hence in R. So we have to 
show that n divides aB and @B in R. The elements (a@B)/n and (@B)/n are roots of 
the polynomial x? — rx + s, where 


—— and =e 


nn- 


By definition of n, these elements r,s are integers, so this is a monic equation in 
Z{x]. Hence (wB)/n and (@B)/n are algebraic integers, as required. 5 


Note. This is the only place where the definition of algebraic integer is used di- 
rectly. The lemma would be false if we took a smaller ring than R, for example, if 
we didn’t take the elements with half integer coefficients when d = | (modulo 4). 


9. THE RELATION BETWEEN PRIME IDEALS OF R 
AND PRIME INTEGERS 


We saw in Section 5 how the primes in the ring of Gauss integers are related to in- 
teger primes. A similar analysis can be made for the ring R of integers in a quadratic 
number field. The main difference is that R is usually not a principal ideal domain, 
and therefore we should speak of prime ideals rather than of prime elements. This 
complicates the analogues of parts (c) and (d) of Theorem (5.1), and we will not 
consider them here. [However, see (12.10). ] 


(9.1) Proposition. Let P be a nonzero prime ideal of R. There is an integer prime 
p so that either P = (p) or PP = (p). Conversely, let p be a prime integer. There is 
a prime ideal P of R so that either P = (p) or PP = (p). 


The proof follows that of parts (a) and (b) of Theorem (5.1) closely. 
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The second case of (9.1) is often subdivided into two cases, according to 
whether or not P and P are equal. The following terminology is customary: If (p) is 
a prime ideal, then we say that p remains prime in R. If PP = (p), then we say that 
p splits in R, unless P = P, in which case we say that P ramifies in R. 

Let us analyze the behavior of primes further. Assume that d = 2 or 3 (mod- 
ulo 4). In this case, R = Z[6] is isomorphic to Z[x]/(x? — d). To ask for prime ide- 
als containing the ideal (p) is equivalent to asking for prime ideals of the ring R/(p) 
[Chapter 10 (4.3)]. Note that 


(9.2) R/(p) = 2[x]/(x? — d,p). 


Interchanging the order of the two relations x? — d = 0 and p = 0 as in the proof 
of Theorem (5.1), we find the first part of the proposition below. The second part is 
obtained in the same way, using the polynomial (6.16). 


(9.3) Proposition. 


(a) Assume that d = 2 or 3 (modulo 4). An integer prime p remains prime in R if 
and only if the polynomial x* — d is irreducible over Fp. 

(b) Assume that d = | (modulo 4). Then p remains prime if and only if the poly- 
nomial x* — x + ;(1 — d) is irreducible over Fp. 5 


10. IDEAL CLASSES IN IMAGINARY QUADRATIC FIELDS 


As before, R denotes the ring of integers in an imaginary quadratic number field. In 
order to analyze the extent to which uniqueness of factorization of elements fails in 
R, we introduce an equivalence relation on ideals which is compatible with ideal 
multiplication and such that the principal ideals form one equivalence class. It is rea- 
sonably clear which relation to use: We call two ideals A, B similar (A ~ B) if there 
are nonzero elements a0,t E€ R so that 


(10.1) oB = TA. 


This is an equivalence relation. The equivalence classes for this relation are called 
ideal classes, and the ideal class of A will be denoted by (A). 

We could also take the element A = o'r of the quadratic number field 
F = QJ[6] and say that A and B are similar if 


(10.2) B=dA, forsomeA € Q[6]. 


Similarity has a nice geometric interpretation. Two ideals A and B are similar 
if the lattices in the complex plane which zepresent them are similar geometric 
figures, by a similarity which is orientation-preserving. To see this, note that a lat- 
tice looks the same at all points. So a similarity can be assumed to relate 0 in A to 0 
in B. Then it will be described as a rotation followed by a stretching or shrinking, 
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that is, as multiplication by a complex number A. Since multiplication by A carries a 
nonzero element a € A to an element Aa = B € B, A = Ba ' is automatically in 
the field F. 

An ideal B is similar to the unit ideal R if and only if B = AR for some A in the 
field. Then Ais an element of B, hence of R. In this case, B is the principal ideal (A). 
So we have the following: 


(10.3) Proposition. The ideal class (R) consists of the principal ideals. o 


Figure (10.4) shows the principal ideal (1 + 6) in the ring Z[5], where 6° = —S. 
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(10.4) Figure. The principal ideal 1 + 6. 


We saw in (7.9) that there are two ideal classes. Each of the ideals A = (2,1 + 8) 
and B = (3,1 + 8), for example, represents the class of nonprincipal ideals. In this 
case 2B = (1 + 5)A. These ideals are depicted in Figure (10.5). 
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(10.5) Figure. The ideals (2, 1 + 6) and (3, | + 8). 
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(10.6) Proposition. The ideal classes form an abelian group €, with law of com- 
position induced by multiplication of ideals: 


(A)(B) = class of AB = (AB); 
the class of the principal ideals is the identity: (R) = (1). 


Proof. lf A ~ A’ and B ~ B’, then A' = AA and B’ = wB for some 
A, © F = Q[6]; hence A’B' = AwAB. This shows that (AB) = (A’B’), hence 
that this law of composition is well-defined. Next, the law is commutative and asso- 
ciative because multiplication of ideals is, and the class of R is an identity (8.2). Fi- 
nally, AA = (n) is principal by the Main Lemma (8.10). Since the class of the prin- 
cipal ideal (n) is the identity in €, we have (A)(A) = (R), so (A) = (A) 1.5 


(10.7) Corollary. Let R be the ring of integers in an imaginary quadratic number 
field. The following assertions are equivalent: 


(i) R is a principal ideal domain; 
(i1) R is a unique factorization domain; 
(iii) the ideal class group € of R is the trivial group. 


For to say that © is trivial is the same as saying that every ideal is similar to the unit 
ideal, which by Proposition (10.3) means that every ideal is principal. By Theorem 
(8.12), this occurs if and only if R is a unique factorization domain. o 


Because of Corollary (10.7), it is natural to count the ideal classes and to con- 
sider this count, called the class number, a measure of nonuniqueness of factoriza- 
tion of elements in R. More precise information is given by the structure of @ as a 
group. As we have seen (7.9), there are two ideal classes in the ring Z[V -5], so its 
ideal class group is a cyclic group of order 2 and its class number is 2. 

We will now show that the ideal class group & is always a finite group. The 
proof is based on a famous lemma of Minkowski about lattice points in convex re- 
gions. A bounded subset S of the plane R? is called convex and centrally symmetric 
if it has these properties: 


(10.8) (a) Convexity: If p,q © S, then the line segment joining p to q is in S. 
(b) Central symmetry: If p € S, then —p € S. 


Notice that these conditions imply that 0 € S, unless S is empty. 


(10.9) Minkowski’s Lemma. Let L be a lattice in R’, and let S be a convex, cen- 
trally symmetric subset of R°. Let A(L) denote the area of the parallelogram spanned 
by a lattice basis for L. If 


Area(S) > 4A(L), 


then S contains a lattice point other than 0. 
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Proof. Define U to be the convex set similar to S, but with half the linear di- 
mension. In other words, we put p € U if 2p € S. Then U is also convex and cen- 
trally symmetric, and Area(U) = 1 Area(S). So the above inequality can be restated 
as Area(U) > A(L). 


(10.10) Figure. 


(10.11) Lemma. There is an element a € L such that U M (U + a) is not 
empty. 


Proof. Let P be the parallelogram spanned by a lattice basis for L. The trans- 
lates P + a with a € L cover the plane without overlapping except along their 
edges. The heuristic reason that the lemma is true is this: There is one translate 
U + a for each translate P + a, and the area of U is larger than the area of P. So 
the translates U + @ must overlap. To make this precise, we note that since U is a 
bounded set, it meets finitely many of the translates P + a, say it meets 
P+aq,...,P + ax. Denote by U; the set (P + aj) M U. Then U is cut into the 
pieces U;,..., Ux, and Area(U) = & Area(Uj). We translate U; back to P by subtract- 
ing @;, setting Vi = U; — a;, and we note that Vi = P M (U — aj). So V; is a subset 
of P, and Area(V;) = Area(Ui). Then 2 Area(V;) = Area(U) > A(L) = Area(P). 
This implies that two of the sets V; must overlap, that is, that for some i # j, 
(U — ai) M (U — aj) is nonempty. Adding a; and setting a = a; — aj, we find 
that U M (U + a) is nonempty too. 


Returning to the proof of Minkowski’s Lemma, choose @ as in Lemma 
(10.11), and let p be a point of U N (U + a). From p € U + a, it follows that 
p—a &U. By central symmetry, g = a — p € U too. The midpoint between 
p and q is $a, which is also in U, because U is convex. Therefore a € S, as re- 


quired. o 


(10.12) Corollary. Any lattice L in R? contains a nonzero vector a@ such that 
la |? < 4A(L)/2. 
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Proof. We apply Minkowski’s Lemma, taking for S a circle of radius r about 
the origin. The lemma guarantees the existence of a nonzero lattice point in S, pro- 
vided that wr* > 4A(L), or that r*> > 4A(L)/7. So for any positive number €, there 
is a lattice point @ with |a@ |? < 4A(L)/m + €. Since there are only finitely many lat- 
tice points in a bounded region and since € can be arbitrarily small, there is a lattice 
point satisfying the desired inequality. o 


We now return to ideals in the ring R of integers in an imaginary quadratic 
field. There are two measures for the size of an ideal, which turn out to be the same. 
The first is the index in R. Since an ideal A is a sublattice of R, it has finite index: 


[R : A] = number of additive cosets of A in R. 
This index can be expressed in terms of the area of the parallelogram spanned by 


basis vectors: 


(10.13) Lemma. Let (a; , a2) and (b,, bz) be lattice bases for lattices B D A in R?, 
and let A(A) and A(B) be the areas of the parallelograms spanned by these bases. 
Then [B : A] = A(A)/A(B). 


We leave the proof as an exercise. o 


(10.14) Corollary. 


(a) Let A be a plane lattice. The area A(A) is independent of the lattice basis for A. 
(b) If C D B D Aare lattices, then [C: A] = [C: B][B: A]. 


It is easy to compute the area A(R) using the description (6.14) of the ring: 


ica S Vial itd = 2,3 (mod 4) 


where D is the discriminant (6.18). 

The other measure of the size of an ideal can be obtained from the Main 
Lemma (8.10); We write AA = (n) and take the integer n (chosen > 0, of course). 
This is analogous to the norm of an element (7.1) and is therefore called the norm of 
the ideal: 


(10.16) N(A) =n, if AA = (n). 
It has the multiplicative property 
(10.17) N(AB) = N(A)N(B), 


because ABAB = AABB = (nm) if N(B) = m. Note also that if A is the principal 
ideal (a), then its norm is the norm of a: 


(10.18) N((a)) = a@ = N(a), 


because (a)(@) = (a@). 
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(10.19) Lemma. For any nonzero ideal A of R, 
[R: A] = N(A). 


(10.20) Corollary. Multiplicative property of the index: Let A and B be nonzero 
ideals of R. Then 


[R: AB] =[R: A][R: B].o 


Let us defer the proof of Lemma (10.19) and derive the finiteness of the class 
number from it. 


(10.21) Theorem. Let « = 2V|D|/7. Every ideal class contains an ideal A such 
that N(A) S p. 


Proof. Let A be an ideal. We have to find another ideal A’ in the class of A 
whose norm is not greater than x. We apply Corollary (10.12): There is an element 
a € A with 


N(a) = ja? Ss 4A(A)/z. 


Then A > (a). This implies that A divides (a), that is, that AC = (a) for some ideal 
C. By the multiplicative property of norms (10.17) and by (10.18), N(A)N(C) = 
N(a) S 4A(A)/7. Using (10.13), (10.14), and (10.19), we write A(A) = 
[R: AJA(R) = 4N(A)V|D|. Substituting for A(A) and cancelling N(A), we find 
NAG) 

Now since CA is a principal ideal, the class (C) is the inverse of (A), i.e.. 
(C) = (A). So we have shown that (A) contains an ideal whose norm satisfies the re- 
quired inequality. Interchanging the roles of A and A completes the proof. o 


The finiteness of the class number follows easily: 


(10.22) Theorem. The ideal class group © is finite. 


Proof. Because of (10.19) and (10.21), it is enough to show that there are 
finitely many ideals with index [R: A] Sp, so it is enough to show that there are 
only finitely many sublattices L C R with [R: L] S w. Choose an integern < p, 
and let L be a sublattice such that [R : L] = n. Then R/L is an abelian group of or- 
der n, so multiplication by n is the zero map on this group. The translation of this 
fact to R is the statement nR C L: Sublattices of index n contain nR. Lemma (8.7) 
implies that there are finitely many such lattices L. Since there are also finitely many 
possibilities for n, we are done. 


The ideal class group can be computed explicitly by checking which of the sub- 
lattices L C R of index Sp are ideals. However, this is not efficient. It is better to 
look directly for prime ideals. Let [4] denote the largest integer less than p. 
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(10.23) Proposition. The ideal class group € is generated by the classes of the 
prime ideals P which divide integer primes p S [py]. 


Proof. We know that every class contains an ideal A of norm N(A) = p, and 
since N(A) is an integer, N(A) S [2]. Suppose that an ideal A with norm < p is 
factored into prime ideals: A = P;-:-P,-. Then N(A) = N(P;)-*-N (Px), by 
(10.17). Hence N(P;) = [u] for each i. So the classes of prime ideals P of norm 
= [yp] form a set of generators of €, as claimed. o 


To apply this proposition, we examine each prime integer p = [py]. If p re- 
mains prime in R, then the prime ideal (p) is principal, so its class is trivial. We 
throw out these primes. If p does not remain prime in R, then we include the class of 
one of its two prime ideal factors P in our set of generators. The class of the other 
prime factor is its inverse. It may still happen that P is a principal ideal. in which 
case we discard it. The remaining primes generate €. 

Table (10.24) gives a few values which illustrate different groups. 


TABLE 10.24 SOME IDEAL CLASS GROUPS 


d D [pu] Ideal class group 
=2 =) l trivial 
= —20 2 order 2 
113) =o 4 order 2 
-—14 =3e 4 order 4, cyclic 
=i —84 bs) Klein four group 
— 753) = 2} 3 order 3 
—26 —104 6 order 6 
17 —47 4 order 5 
—71 =] 5 order 7 


(10.25) Examples. To apply Proposition (10.23), we factor (p) into prime ideals for 
all prime integers p = wp. 


(a) d = —7. In this case [4] = 1. Proposition (10.23) tells us that the class group € 
is generated by the empty set of prime ideals. So © is trivial, and R is a unique fac- 
torization domain. 


(b) d = -67. Here R = Z[n], where yn = 3(1 + 5), and [wu] = 5. The ideal class 
group is generated by the prime ideals dividing 2,3,5. According to Proposition 
(9.3), a prime integer p remains prime in R if and only if the polynomial 
x? — x + 17 is irreducible modulo p. This is true for each of the primes 2, 3,5. So 
the primes in question are principal, and the ideal class group is trivial. 


(c) d = -14. Here [ye] = 4, so € is generated by prime ideals dividing (2) and (3). 
The polynomial x* + 14 is reducible, both modulo 2 and modulo 3, so by (9.3) nei- 
ther of these integers remains prime in R. Say that (2) = PP and (3) = QQ. As in 
the discussion of Z[V—5], we find that P = (2,5) = P. The ideal class (P) has or- 
der 2 in €. 
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To compute the order of the class (Q), we may compute the powers of the ideal 
explicitly and find the first power whose lattice is similar to R. This is not efficient. It 
is better to compute the norms of a few small elements of R, hoping to deduce a rela- 
tion among the generators. The most obvious elements to try are 6 and | + 6. But 
N(S) = 14 and N(1 + 6) = 15. These are not as good as we may hope for, be- 
cause they involve the primes 7 and 5, whose factors are not among our generators. 
We’d rather not bring in these extra primes. The element 2 + 6 is better: 
N(2 + 6) = (2 + 5)(2 — 6) = 2-3: 3. This gives us the ideal relation 


(2 + 6)(2 — 8) = PPQQQQ = P?Q7Q’. 


Since 2 + 6 and 2 — 6 are not associates, they do not generate the same ideal. On 
the other hand, they generate conjugate ideals. Taking these facts into account, the 
only possible prime factorizations of (2 + 5) are PQ? and PQ’. Which case we have 
depends on which factor of (3) we label as Q. So we may suppose that (2 + 6) = 
PQ’. Then since (2 + 8) is a principal ideal, (P)(Q)’ = 1 in @. Hence (Q)* = 


(P)"' = (P). This shows that € is the cyclic group of order 4 generated by (Q). 


(d) d = —23, and hence R = Z[n] where yn = 3(1 + 6). Then [yp] = 3, so © is 
generated by the classes of the prime ideals dividing (2) and (3). Both of these 
primes split in R, because the polynomial x° — x + 6 is reducible modulo 2 and 
modulo 3 (9.3). In fact, (2) = PP, where P has the lattice base (2,7) [see (7.8)]. 
This is not a principal ideal. 

Say that (3) = QQ. To determine the structure of the ideal class group, we 
note that N(n) = 2: 3 and N(1+n) = 2-2- 2. Therefore 


(n)\() = PPQQ and (1+n)(1+) = (8) = (2)° = P®P?. 


Interchanging the roles of P,P and of Q,Q as necessary, we obtain (yn) = PQ and 
(1 + y) =P? or P®. Therefore (P)? = (1) and (Q) = (P)"! in €. The ideal class 
group is a cyclic group of order 3. o 


Proof of Lemma (10.19). This lemma is true for the unit ideal R. We will 
prove that [R : P] = N(P) if P is a prime ideal, and we will show that if P is prime 
and if A is an arbitrary nonzero ideal, then[R : AP] = [R: A][R: P]. It will follow 
that if [R: A] = N(A), then [R: AP] = N(AP). Induction on the length of the 
prime factorization of an ideal will complete the proof. 


(10.26) Lemma. Let n be an ordinary integer, and let A be an ideal. Then 
[R : nA] = n7[R: A]. 


Proof. We know that R D A D nA, and therefore (10.14b) [R: nA] = 
[R : A][A : nA]. Thus we must show that [A : nA] = n?. Now A isa lattice, and nA 
is the sublattice obtained by stretching by the factor n: 
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(10.27) Figure. 3A = {+}. 


Clearly, [A : nA] = n?, as required. 5 


We return to the proof of Lemma (10.19). There are two cases to consider for 
the ideal P. According to (9.1), there is an integer prime p so that either P = (p) or 
PRP = lp 

In the first case, N(P) = p’, and AP = pA. We can use Lemma (10.26) twice 
to conclude that [R: AP] = p*[R: A] and [R: P] = p*7[R:R] = p?. Thus 
[R: AP] =[R: A][R: P] and [R: P] = N(P), as required. 

In the second case, N(P) = p. We consider the chain of ideals 
A > AP > APP. It follows from the Cancellation Law (8.11a) that this is a strictly 
decreasing chain of ideals, hence that 


(10.28) [R: A] <[R: AP] <[R: APP]. 


Also, since PP = (p), we have APP = pA. Therefore we may apply Lemma 
(10.26) again, to conclude that [R : APP] = p’[R: A]. Since each index (10.28) is 
a proper division of the next, the only possibility is that [R : AP] = p[R: A]. Ap- 
plying this to the case A = R shows that [R: P] = p= N(P). So we find 
[R: AP] =[R: A][R: P] and [R: P] = N(P) again. This completes the proof. o 


Il, REAL QUADRATIC FIELDS 


In this section we will take a brief look at real quadratic number fields Q[6], where 
52 = d > 0. We will use the field Q[V/2] as an example. The ring of integers in 
this field is 


(11.1) R = Z[V2] = {a + bV2\a,b ED. 
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Since o[ Vd] is a subfield of the real numbers, the ring of integers is not em- 
bedded as a lattice in the complex plane, but we can represent R as a lattice by using 
the coefficients (a,b) as coordinates. A slightly more convenient representation of R 
as a lattice is obtained by associating to the algebraic integer a + bVd the point 
(u,v), where 


Ur 2) u=atbV4d, v=a-— bVa. 


The resulting lattice is depicted below for the case d = 2: 


(11.3) Figure. The lattice Z[V2]. 


Since the (uw, v)-coordinates are related to the (a, b)-coordinates by the linear trans- 
formation (11.2), there is no essential difference between the two ways of depicting 
R, though since the transformation is not orthogonal, the shape of the lattice is dif- 
ferent in the two representations. 

Recall that the field Q[Vd] is isomorphic to the abstractly constructed field 


(11.4) F = Q[x]/(x’ — d). 


Let us replace Q[Vd] by F and denote the residue of x in F by 6. So this element 6 
is an abstract square root of d rather than the positive real square root. Then the co- 
ordinates u,v represent the two ways that the abstractly given field F can be embed- 
ded into the real numbers; namely u sends Sw» Vd and v sends 5» - Vd. 

For a =a + bd € Q[6], let us denote by a’ the “conjugate” element 
a — bé. The norm of a is defined to be 


(eS) N(a) = aa’ = a’ — db’, 
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in analogy with the imaginary quadratic case (7.1). If @ is an algebraic integer, then 
N (a) is an integer, not necessarily positive, and 


(11.6) N(aB) = aBa'B' = N(a)N(B). 


With this definition of norm, the proof of unique factorization of ideals into prime 
ideals in imaginary quadratic fields carries over. 

There are two notable differences between real and imaginary quadratic fields. 
The first is that, for real quadratic fields, ideals in the same class are not similar geo- 
metric figures when embedded as lattices in the (wv. v)-plane by (11.2). In particular, 
principal ideals need not be similar to the lattice R. The reason is simple: Multiplica- 
tion by an element a = a + bé stretches the v-coordinate by the factor a + bVd, 
and it stretches the v-coordinate by the different factor a -- b Vd. This tact compli- 
cates the geometry slightly, and it 1s the reason that we developed the imaginary 
quadratic case first. It does not change the theory in an essential way: The class 
number is still finite. 

The second difference is more important. It is that there are infinitely many 
units in the rings of integers in a real quadratic field. Since the norm N (q@) of an al- 
gebraic integer is an ordinary integer, a unit must have norm +1 as before [sec 
(7.3)], and if N(a) = a@ = +1, then +a’ is the inverse of a, so a is a unit. For 
example, 


(11.7) a=1FPV2, @ =3+2V2 


are units in the ring R = Z{V2]. Their norms are ~1 and | respectively. The ele- 
ment @ has infinite order in the group of units of R. 

The condition N(a) = a? — 2b° = +1 for units translates in (u,c)- 
coordinates to 


(1i-8) Ne = 32). 


The units are the points of the lattice which lie on one of the two hyperbolas uc = | 
and uc = — |. These hyperbolas are depicted in Figure (11.3). It is a remarkable 
fact that real quadratic fields always have infinitely many units or, what amounts to 
the same thing, that the lattice of integers always contains infinitely many points on 
the hyperbola uv = 1. This fact is not obvious, either algebraically or geometrically. 


(11.9) Theorem. Let R be the ring of integers in a real quadratic number field. 
The group of units in R is infinite. 


(11.10) Lemma. Let A denote the area of the parallelogram spanned by a lattice 
basis of R, in its embedding into the (u, v)-plane. There are infinitely many elements 
B of R whose norm N () is bounded, in fact, such that | N(B)| = B, where B is any 
real number > A. 


Proof. In the embedding into the (u, v)-plane, the elements of norm r are the 
lattice points on the hyperbola xy = r, and the elements whose norm is bounded in 
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absolute value by a positive number B are those lying in the region &% bounded by the 
four branches of the hyperbolas xy = B, xy = —B. 


(11.11) Figure. 


Choose an arbitrary positive real number uo. Then the rectangle S whose vertices are 
(+uo, +B/uo) lies entirely in the region %, and the area of this rectangle is 4B. So if 
B > A, then Minkowski’s Lemma guarantees the existence of a nonzero lattice point 
a in S. The norm of this point is bounded by B. This is true for all uo, and if uo is 
very large, the rectangle S is very narrow. On the other hand, there are no lattice 
points on the uo- axis, because there are no nonzero elements in R of norm zero. So 
no particular lattice point is contained in all the rectangles S. It follows that there 
are infinitely many lattice points in B. o 


Since there are only finitely many integers r in the interval -B Sr SB, 
Lemma (1.1.10) implies the following corollary: 


(11.12) Corollary. For some integer r, there are infinitely many elements of R of 
norm r.oa 


Let r be an integer. We will call two elements B; = m; + nid of R congruent 
modulo r if r divides B, — B2 in R. If d = 2 or 3 (modulo 4), this just means that 
m, = m2 and n, = m (modulo r). 


(11.13) Lemma. Let £,, 82 be elements of R with the same norm r, and which 
are congruent modulo r. Then £,/P> is a unit of R. 


Proof. It suffices to show that B,/B2 is in R, because the same argument will 
show that B2/B: € R, hence that B,/B2 is a unit. Let B;’ = m; — nj6 be the conju- 
gate of B;. Then Bi\/B2 = B:B2'/B2B2' = B: B2'/r. But B2’ = B,' (modulo r), so 
B: B2’ = Bi Bi’ =r (modulo r). Therefore r divides B,B2’. which shows that 
B:/B2 © R, as required. o 
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Proof of Theorem (11.9). We choose r so that there are infinitely many ele- 
ments B = m + néd of norm r. We partition the set of these elements according to 
the congruence classes modulo r. Since there are finitely many congruence classes, 
some class contains infinitely many elements. The ratio of any two of these elements 
is a unit. o 


12. SOME DIOPHANTINE EQUATIONS 


Diophantine equations are polynomial equations with integer coefficients, which are 
to be solved in the integers. The most famous is the Fermat Equation 


cl2sh) x” tye . 


Fermat's “Last Theorem” asserts that if n = 3 this equation has no integer solutions 
x,y,z, except for the trivial solutions in which one of the variables is zero. Fermat 
wrote this theorem in the margin of a book, asserting that the margin did not con- 
tain enough room for his proof. No proof is known today, though the theorem has 
been proved for all n < 10°. Also, a theorem proved by Faltings in 1983, which ap- 
plies to this equation as well as to many others, shows that there are only finitely 
many integer solutions for any given value of n. 

This section contains a few examples of Diophantine equations which can be 
solved using the arithmetic of imaginary quadratic numbers. They are included only 
as samples. An interested reader should look in a book on number theory for a more 
organized discussion. 

We have two methods at our disposal, namely arithmetic of quadratic number 
fields and congruences, and we will use both. 


(12.2) Example. Determination of the integers n such that the equation 


r+y=n 


has an integer solution. 


Here the problem is to determine the integers n which are sums of two squares 
or, equivalently, such that there is a point with integer coordinates on the circle 
x? + y? = n. Theorem (5.!) tells us that when p is a prime, the equation x? + y* = 
p has an integer solution if and only if either p = 2 or p = 1| (modulo 4). It is not 
difficult to extend this result to arbitrary integers. To do so, we interpret a sum of 
squares a* + b? as the norm a@ of the Gauss integer a = a + bi. Then the prob- 
lem is to decide which integers n are the norms of Gauss integers. Now if a Gauss 
integer a is factored into Gauss primes, say a = 7 --- 77%, then its norm factors too: 
N (a) = N (a) --- N (am). So if n is the norm of a Gauss integer, then it is a product 
of norms of Gauss primes, and conversely. The norms of Gauss primes are the 
primes p = | (modulo 4), the squares of primes p = 3 (modulo 4), and the prime 
2. Thus we have the following theorem: 
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(12.3) Theorem. The equation x? + y* = n has an integer solution if and only if 
every prime p which is congruent 3 modulo 4 has an even exponent in the factoriza- 
tion of n. 5 


(12.4) Example. Determination of the integer solutions of the equation 
y+ 13 =. 
We factor the left side of the equation, obtaining 
(y S06) (y= co) neiae", 


where 5 = V-13. The ring of integers R = Z[65] is not a unique factorization do- 
main, so we will analyze this equation using ideal factorization. 


(12.5) Lemma. Let a, b be integers, and let R be any ring containing Z as a sub- 
ring. If a and b are contained in a common proper ideal A of R, then they have a 
common prime factor in Z. 


Proof. We prove the contrapositive. If a,b have no common prime factor in 
Z, then we can write | = ra + sb, r,s © Z. This equation shows that if a, b are in 
an ideal A of R, then 1 € A too. Hence A is not a proper ideal. o 


(12.6) Lemma. Let x, y be an integer solution of the equation (12.4). The two el- 
ements y + 6 and y — 6 have no common prime ideal factor in R. 


Proof. Let P be a prime ideal of R which contains y + 6 and y — 6. Then 
2y © P and 20 © P. Since P is a prime ideal, either 2] fF, cresey ©] 7 and 
6 EP. 

In the first case, 2 and y* + 13 are not relatively prime integers by Lemma 
(12.5), and since 2 is prime, it divides y? + 13 in Z. This implies that 2 divides x 
and that 8 divides y? + 13 = x*. So y must be odd. Then y* = 1 (modulo 4); hence 
y? + 13 = 2 (modulo 4). This contradicts x? = 0 (modulo 8). 

Suppose that y,6 € P. Then 13 € P, and hence 13 and y are not relatively 
prime in Z, that is, 13 divides y. Therefore 13 divides x, and reading the equation 
y? + 13 = x* modulo 13°, we obtain 13 = 0 (modulo 137), which is a contradic- 
tion. So we have shown that y + 6 and y — 6 are relatively prime in R. o 


We now read the equation (y + 5)(y — 5) = (x) as an equality of principal 
ideals of R, and we factor the right side into primes, say 


(y+ 6) > 6) PrP) 


On the right we have a cube, and the two ideals on the left have no commen primé 
factor. It follows that each of these ideals is a cube too, say (y + 6) = A® and 
(y — 6) = A? for some ideal A. Looking at our table of ideal classes, we find that 
the ideal class group of R is cyclic of order 2. So the ideal classes of A and A® are 
equal. Since A* is a principal ideal, so is A, say A = (u + v8), for some integers 
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u,v. We have been lucky. Since the units in R are +1, (u + v6)*>= +(y + 6) 
Changing sign if necessary, we may assume that (u + vd)? = y + 6. 

We now complete the analysis by studying the equation v + 6 = (u + v6)>. 
We expand the right side, obtaining 


y + 6 = (u?—39uv’) + (3u?0—130%)5. 


So y = u* — 39uv* and 1 = (3u* — 13v*)v. The second equation implies that v = 
+] and that 3u° — 13 = +1. The only possibilities are u = +2 and v = —1. Then 
y = +70 and x = (u + vd)(u — vd) = 17. These values do give solutions, so the 
integer solutions of the equation y* + 13 = x* arex = 17 andy = +70.0 


(12.7) Example. Determination of the prime integers p such that 
toot Sy wp 
has an integer solution. 


Let 6 = V-5, and let R = Z[5]. We know (9.3a) that the principal ideal (p) 
splits in R if and only if the congruence x? = ~5 (modulo p) has an integer solution. 
If (p) = PP and if P is a principal ideal, say P = (a + bd), then (p) = 
(a + bd)a - bd) =a? che Sb*); Since the only units in R are +1, a? + 5b? = 
+ p, and since a? + 5b? is positive, a* + 5b? = p. 

Unfortunately, R is not a principal ideal domain. So it is quite likely that (p) = 
PP but that P is not a principal ideal. To analyze the situation further, we use the 
fact that there are exactly two ideal classes in R. The principal ideals form one class, 
and the other class is represented by any nonprincipal ideal. The ideal 
A = (2,1 + 6) is one nonprincipal ideal, and we recall that for this ideal 
A? = AA = (2). Now since the ideal class group is cyclic of order 2, the product of 
any two ideals in the same class is principal. Suppose that (p) = PP and that P is 
not a principal ideal. Then. AP is principal, say AP = (a + bd). Then 
(a + b8)(a — b5) = APAP = (2p). We find that a? + 5b? = 2p. 


(12.8) Lemma. Let p be an odd prime. The congruence x* = —5 (modulo p) has 
a solution if and only if one of the two equations x* + Sy? = p or x? + Sy? = 2p 
has an integer solution. 


Proof. \f the congruence has a solution, then (p) = PP, and the two cases are 
decided as above, according to whether or not P is principal. Conversely, if 
a* + 5b? = p, then (p) splits in R, and we can apply (9.3a). If a? + 5b* = 2p, 
then (a + bS)(a — b5) = (2p) = AA(p). It follows from unique factorization of 
ideals that (p) splits too, so (9.3a) can be applied again. o 


This lemma does not solve our original problem, but we have made progress. 
In most such situations we could not complete our analysis. But here we are lucky 
again, or rather this example was chosen because it admits a complete solution: The 
two cases can be distinguished by congruences. If a? + 5b* = p, then one of the 
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two integers a, b is odd and the other is even. We compute the congruence modulo 
4, finding that a? + 5b? = 1 (modulo 4). Hence p = | (modulo 4) in this case. If 
a? + 5b? = 2p, we compute the congruences modulo 8. Since p = | or 3 (modulo 
4), we know that 2p = 2 or 6 (modulo 8). Any square is congruent 0, 1, or 4 (mod- 
ulo 8). Hence 5b? = 0,5, or 4 (modulo 8), which shows that a* + 5b’ can not be 
congruent to 2 (modulo 8). Thus p = 3 (modulo 4) in this case. We have therefore 
proved the following lemma: 


(12.9) Lemma. Let p be an odd prime. Assume that the congruence x? = —5 
(modulo p) has a solution. Then x? + Sy? = p has an integer solution if p = 1 
(modulo 4), and x? + 5y? = 2p has an integer solution if p = 3 (modulo 4). 


There remains finally the problem of characterizing the odd primes p such that 
the congruence x? = —5 has a solution modulo p. This is done by means of the 
amazing Quadratic Reciprocity Law, which asserts that x* = 5 (modulo p) has a so- 
lution if and only if x* = p (modulo 5) has one! And the second congruence has a 
solution if and only if p = +1 (modulo 5). Combining this with the previous lemma 
and with the fact that —1 is a square modulo 5, we find: 


(12.10) Theorem. Let p be an odd prime. The equation x? + 5y* = p has an in- 


teger solution if and only if p = 1 (modulo 4) and p = +1 (modulo 5). o 


Nullum vero dubium nobis esse videtur, 
quin multa eaque egregia in hoc genere adhuc lateant 
in quibus alii vires suas exercere possint. 


Karl Friedrich Gauss 


EXERCISES 
I. Factorization of Integers and Polynomials 


1. Let a, b be positive integers whose sum is a prime p. Prove that their greatest common 


divisor is 1. 
2. Define the greatest common divisor of a set of integers, and prove its existence. 
3. Prove that if d is the greatest common divisor of a),...,dn, then the greatest common 


divisor of a,/d,...,a,/d is 1. 
4. (a) Prove that if n is a positive integer which is not a square of an integer, then Vn is 
not a rational number. 
(b) Prove the analogous statement for nth roots. 
5. (a) Let a, b be integers with a # 0, and write b = aq + r, where 0 <r < |a|. Prove 
that the two greatest common divisors (a, b) and (a, r) are equal. 
(b) Describe an algorithm, based on (a), for computing the greatest common divisor. 
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10. 


me 


12. 


13. 


*14, 


A, 


(c) Use your algorithm to compute the greatest common divisors of the following: 
(a) 1456, 235, (b) 123456789, 135792468. 


. Compute the greatest common divisor of the following polynomials: x7 — 6x? + x + 4, 


x>— 6x + 1. 


. Prove that if two polynomials f, g with coefficients in a field F factor into linear factors 


in F, then their greatest common divisor is the product of their common linear factors. 


. Factor the following polynomials into irreducible factors in F, [x]. 


(aixetxdilop =2 (b) x? -3x-3,p=5 © x’? +.1,p=7 


. Euclid proved that there are infinitely many prime integers in the following way: If 


Pi,---, Pk are primes, then any prime factor p of n = (pi:-: pn) + 1 must be different 

from all of the p;. 

(a) Adapt this argument to show that for any field F there are infinitely many monic ir- 
reducible polynomials in F [x]. 

(b) Explain why the argument fails for the formal power series ring F[[x]]. 

Partial fractions for integers: 

(a) Write the fraction r = 7/24 in the form r = a/8 + b/3. 

(b) Prove that if n = uv, where u and v are relatively prime, then every fraction r = 
m/n can be written in the form r = a/u + b/v. 

(c) Letn = n,n2--: nx be the factorization of an integer n into powers of distinct primes: 
n; = pi. Prove that every fraction r = m/n can. be written in the form 
r= m/n, + +> + mg/rk. 

Chinese Remainder Theorem: 

(a) Let n,m be relatively prime integers, and let a,b be arbitrary integers. Prove that 
there is an integer x which solves the simultaneous congruence x = a (modulo m) 
and x = b (modulo n). 

(b) Determine all solutions of these two congruences. 

Solve the following simultaneous congruences. 

(a) x = 3 (modulo 15), x = 5 (modulo 8), x = 2 (modulo 7). 

(b) x = 13 (modulo 43), x = 7 (modulo 71). 

Partial fractions for polynomials: 

(a) Prove that every rational function in C(x) can be written as sum of a polynomial and 
a linear combination of functions of the form 1/(x — a). 

(b) Find a basis for C(x) as vector space over C. 


Let F be a subfield of C, and let f © F[x] be an irreducible polynomial. Prove that f has 
no multiple root in C. 


. Prove that the greatest common divisor of two polynomials f and g in Q(x] is also their 


greatest common divisor in C[x]. 


. Let a and b be relatively prime integers. Prove that there are integers m,n such that 


a™ + b” = 1 (modulo ab). 


Unique Factorization Domains, Principal Ideal Domains, 


and Euclidean Domains 


Prove or disprove the following. 
(a) The polynomial ring R[x, y] in two variables is a Euclidean domain. 
(b) The ring Z[x] is a principal ideal domain. 
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2. Prove that the following rings are Euclidean domains. 
(a) Z[g], ¢ =e" (b) ZLV-2]. 
3. Give an example showing that division with remainder need not be unique in a Euclidean 
domain. 
4. Let m,n be two integers. Prove that their greatest common divisor in Z is the same as 
their greatest common divisor in Z[i]. 
5. Prove that every prime element of an integral domain is irreducible. 
6. Prove Proposition (2.8), that a domain R which has existence of factorizations is a 
unique factorization domain if and only if every irreducible element is prime. 
7. Prove that in a principal ideal domain R, every pair a, b of elements, not both zero, has 
a greatest common divisor d, with these properties: 
(i) d = ar + bs, for some r,s € R; 
(ii) d divides a and b; 
(iii) if e € R divides a and b, it also divides d. 
Moreover, d is determined up to unit factor. 
8. Find the greatest common divisor of (11 + 7i, 18 — i) in Z[{i]. 
9. (a) Prove that 2,3,1 + V—5 are irreducible elements of the ring R = Z2[V -5] and 
that the units of this ring are +1. 
(b) Prove that existence of factorizations is true for this ring. 
10. Prove that the ring R{[r]] of formal real power series is a unique factorization domain. 
11. (a) Prove that if R is an integral domain, then two elements a, b are associates if and 
only if they differ by a unit factor. 
*(b) Give an example showing that (a) is false when R is not an integral domain. 

12. Let R be a principal ideal domain. 

(a) Prove that there is a least common multiple [a, b] = m of two elements which are not 
both zero such that a and b divide m, and that if a, b divide an element r € R, then 
m divides r. Prove that m is unique up to unit factor. 

(b) Denote the greatest common divisor of a and b by (a, b). Prove that (a, b)[a, b] is an 
associate of ab. 

13. If a, b are integers and if a divides b in the ring of Gauss integers, then a divides b in Z. 

14. (a) Prove that the ring R (2.4) obtained by adjoining 2*-th roots xz of x to a polynomial 

ring is the union of the polynomial rings F [xx]. 
(b) Prove that there is no factorization of x, into irreducible factors in R. 

15. By a refinement of a factorization a = b,--: bk we mean the expression for a obtained by 
factoring the terms b;. Let R be the ring (2.4). Prove that any two factorizations of the 
same element a € R have refinements, all of whose factors are associates. 

16. Let R be the ring F[u, v, y, x1,X2,X3,...]/(x1y = uv, x2? = x1, x3? = x,...). Show that 
u, v are irreducible elements in R but that the process of factoring uv need not terminate. 

17. Prove Proposition (2.9) and Corollary (2.10). 

18. Prove Proposition (2.11). 

19. Prove that the factorizations (2.22) are prime in Z[i]. 


20. The discussion of unique factorization involves only the multiplication law on the ring R, 
so it ought to be possible to extend the definitions. Let S be a commutative semigroup. 
meaning a set with a commutative and associative law of composition and with an iden- 
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“2, 


tity. Suppose the Cancellation Law holds in S: If ab = ac then b = c. Make the appro- 
priate definitions so as to extend Proposition (2.8) to this situation, 

Given elements v,,... v, in Z*, we can define a semigroup S as the set of all linear com- 
binations of (v,,..., Un) with nonnegative integer coefficients, the law of composition be- 
ing addition. Determine which of these semigroups has unique factorization. 


3. Gauss’s Lemma 


Le 


7 


3: 


bet fc 


10. 


11. 


Let a, b be elements of a field F, with a # 0. Prove that a polynomial f(x) € F [x] is ir- 
reducible if and only if f(ax + b) is irreducible. 

Let F = C(x), and let f, g € C[x,y]. Prove that if f and g have a common factor in 
F[y], then they also have a common factor in CLx, y]. 

Let f be an irreducible polynomial in C[x, y|. and let g be another polynomial. Prove that 
if the variety of zeros of g in C* contains the variety of zeros of f, then f divides g. 


. Prove that two integer polynomials are relatively prime in Q[x] if and only if the idea! 


they generate in Z[x] contains an integer. 


. Prove Gauss’s Lemma without reduction modulo p, in the following way: Let a; be the 


coefficient of lowest degree i of f which is net divisible by p. So p divides a, if vy < i, but 
p does not divide a;. Similarly, Jet b; be the coefficient of lowest degree of g which is not 
divisible by p. Prove that the coefficient of h of degree i + j is not divisible by p. 


. State and prove Gauss’s Lemma for Euclidean domains. 


The cubic polynomial f(x) = x? + a.x* + a;x + ao © C[x] can be described by the 
point a = (ao, a), a2) € C*. Prove that the locus of points which correspond to reducible 
cubic polynomials is a subvariety of C*. 


. Prove that an integer polynomial is primitive if and only if it is not contained in any of 


the kernels of the maps (3.2). 


. Prove that cet a | is irreducible in the polynomial ring C[x, y, z, w]. 


Prove that the kernel of the homomorphism Z[x]——>R sending x»~» 1 + V2 is a 

principal ideal, and find a generator for this ideal. 

(a) Consider the map wy: C[x, y]——> C[1] defined by f(x, y)~~ f(t’, t°). Prove that its 
kernel is a principal ideal, and that its image is the set of polynomials p(t) such that 
Pp (Oy) ="0. 

(b) Consider the map g: C[x,y]——>C[t] defined by f(x, y)»~~ (t? — t, 1° — 27). 
Prove that ker ¢ is a principal ideal, and that its image is the set of polynomials p(t) 
such that p(0) = p(i). Give an intuitive explanation in terms of the geometry of the 
variety {f = 0} in C’. 


4. Explicit Factorization of Polynomials 


LE 


Prove that the following polynomials are irreducible in Q[x]. 
(a) eee le bex? + Gecel2 “(esx — Gxt 1 (d) x* + 6x? + 7 
(E) oes 


. Factor x5 + 5x + 5 into irreducible factors in Q[x] and in F,[x]. 
. Factor x? + x + 1 in F,[x], when p = 2,3,5. 


10. 


i: 
12. 


a3. 
14. 


+15: 


16. 


17. 
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. Factor x4 + x? + 1 into irreducible factors in Q[x]. 
. Suppose that a polynomial of the form x* + bx? + c is a product of two quadratic fac- 


tors in Q[x]. What can you say about the coefficients of these factors? 


. Prove that the following polynomials are irreducible. 


(a) x2 + x + Ll inthe field F, (b) x? + 1inF, (c) x? — 9 in Fs, 


. Factor the following polynomials into irreducible factors in Q[x]. 


(a) x° — 3x —2 () x°— 3x42 (C) x — 6x’ Foy 5 


. Let p be a prime integer. Prove that the polynomial x” — p is irreducible in Q[x]. 
. Using reduction modulo 2 as an aid, factor the following polynomials in Q[x]. 


(a) x2 + 2345x + 125 (b) x2 + 5x2 + 10x +5 (c) x2 + 2x? + 3x4 1 

(ay x* + 2x7 + 2y2 4+ 2b 2 elite 2x? + 3x2 2 

(ix) poy, 4x? eee Le aA 2x ea 

Let p be a prime integer, and let f € Z[x] be a polynomial of degree 2n + 1, say 

F(x) = dona sx2"t! + «+ + ax + ao. Suppose that an+; #0 (modulo _p), 

Ay, @),...,4n = 0 (modulo p2), an+1,..-,@2n = 0 (modulo p), ao # 0 (modulo p*). Prove 

that f is irreducible in Q[x]. 

Let p be a prime, and let A # / be an-n X n integer matrix such that A? = / but A # /. 

Prove thatn = p — 1. 

Determine the monic irreducible polynomials of degree 3 over F3. 

Determine the monic irreducible polynomials of degree 2 over Fs. 

Lagrange interpolation formula: 

(a) Let xo,..., xa be distinct complex numbers. Determine a polynomial p(x) of degree n 
which is zero at x;,...,%, and such that p(x) = 1. 

(b) Let x0,...,xd5 Yo,..., yg be complex numbers, and suppose that the x; are all different. 
There is a unique polynomial g(x) € C[x] of degree < d, such that g(xi) = y; for 
each i = 0,...,d. Prove this by determining the polynomial g explicitly in terms of 
Xi, Yi- 

Use the Lagrange interpolation formula to give a method of finding all integer polyno- 

mial factors of an integer polynomial in a finite number of steps. 

Let f(x) = x" + an—1x""' + ++ + ax + a be a monic polynomial with integer 

coefficients, and let r € @ be a rational root of f(x). Prove that r is an integer. 

Prove that the polynomial x” + y? — 1 is irreducible by the method of undetermined 

coefficients, that is, by studying the equation (ax + by + c)(a'x + b'y +c’) = 

x? + y? — 1, where a, b,c,a',b',c' are unknown. 


5. Primes in the Ring of Gauss Integers 


Lc 
. Factor 30 into primes in Z[i]. 
. Factor the following into Gauss primes. 


Prove that every Gauss prime divides exactly one integer prime. 


(a) 1— 3% (b) 10 (c) 6+ 9i 


- Make a neat drawing showing the primes in the ring of Gauss integers in a reasonable 


size range. 


- Let 7 be a Gauss prime. Prove that a and 7 are associate if and only if either zr is asso- 


ciate to an integer prime or a7 = 2. 
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6. Let R be the ring Z[V3]. Prove that a prime integer p is a prime element of R if and only 
if the polynomial x? — 3 is irreducible in F,[.]. 

7. Describe the residue ring Z[i]/(p) in each case. 
(a) p= 2 (b) p = 1 (modulo 4) (c) p = 3 (modulo 4) 

*8. LetR = Z[Z]. where ¢ = $(-1 + V3) is a complex cube root of 1. Let p be an integer 
prime # 3. Adapt the proof of Theorem (5.1) to prove the following. 
(a) The polynomial x? + x + 1 has a root in F, if and only if p = 1 (modulo 3). 
(b) (p) is a prime ideal of R if and only if p = —1 (modulo 3). 
(ce) p factors in R if and only if it can be written in the form p = a? + ab + b?, for 
some integers a, b, 

(d) Make a drawing showing the primes of absolute value < 10 in R. 


6. Algebraic Integers 


1. Is 4(1 + V3) an algebraic integer? 

2. Let a be an algebraic integer whose monic irreducible polynomial over Z_ is 
x" + an—in" | + ++ + ayx + do, and let R = Z[a]. Prove that a is a unit in R if and 
only if a9 = +1. 

3. Let d,d' be distinct square-free integers. Prove that Q(Vd) and Q(Vd') are different 
subfields of C. 

4. Prove that existence of factorizations is true in the ring of integers in an imaginary 
quadratic number field. 

5. Let @ be the real cube root of 10, and let B = a + ba + ca’, with a,b,c, € Q. Then 
B is the root of a monic cubic polynomial f(x) € Q[x]. The irreducible polynomial for « 
over @ is x* — 10, and its three roots are a, a’ = fa, and a” = £?a, where 
¢=e2"/3, The three roots of f are B, B'=a+t bla+cl?a’, and 
B= ast hizo tera~, sojix) = (eB) — Bx — py. 

(a) Determine f by expanding this product. The terms invelving a and a” have to cancel 
out, so they need not be computed. 
(b) Determine which elements B are algebraic integers. 

6. Prove Proposition (6.17). 

7. Prove that the ring of integers in an imaginary quadratic field is a maximal subring of C 
with the property of being a lattice in the complex plane. 

8. (a) Let S = Z[a], where a is a complex root of a monic polynomial of degree 2. Prove 

that S is a lattice in the complex plane. 
(b) Prove the converse: A subring S of C which is a lattice has the form given in (a). 

9. Let R be the ring of integers in the field Q[Vd]. 

(a) Determine the elements a € R such that R = Z[a]. 
(b) Prove that if R = Z[a] and if a is a root of the polynomial x* + bx + c over Q, 
then the discriminant b? — 4c is D (6.18). 


7. Factorization in Imaginary Quadratic Fields 


1. Prove Proposition (7.3) by arithmetic. 
2. Prove that the elements 2,3,1 + V-5,1 — V-5 are irreducible elements of the ring 
Zin =5). 
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o 


4. 


9: 


Ie 


as 


3. 


. Let d = —5. Determine whether or not the lattice of integer linear combinations of the 


given vectors is an ideal. 
(a) (6,1 + 6) (b) (7,1 +6) (©) @ = 20282) 28, 6 + 40) 


. Let A be an ideal of the ring of integers R in an imaginary quadratic field. Prove that 


there is a lattice basis for A one of whose elements is a positive integer. 


. Let R = Z[V—-5]. Prove that the lattice spanned by (3,1 + V-5) is an ideal in R, de- 


termine its nonzero element of minimal absolute value, and verify that this ideal has the 
form (7.9), Case 2. 


. With the notation of (7.9), show that if a is an element of R such that (a + a6) is also 


in R, then (a,}(a@ + @6)) is a lattice basis of an ideal. 


. For each ring R listed below, use the method of Proposition (7.9) to describe the ideals in 


R. Make a drawing showing the possible shapes of the lattices in each case. 
(a) R = Z[V-3] (b) R = Zi3(1 + V-3)] (©) R=ZV-6] @) R= Z[V-7] 
(e) R= Z[i(1 + V-7)] (ff) R = ZAV-10] 


. Prove that R is not a unique factorization demain when d = 2 (modulo 4) and d < ~2. 
. Let d < -3. Prove that 2 is not a prime element in the ring Z[ Va], but that 2 is irre- 


ducible in this ring. 


Ideal Factorization 


. Let R = Z[V -6]. Factor the ideal (6) into prime ideals explicitly. 
. Let5 = V-3 and R = Z[6]. (This is not the ring of integers in the imaginary quadratic 


number field Q[6].) Let A be the ideal (2, 1 + 5). Show that AA is not a principal ideal, 
hence that the Main Lemma is not true for this ring. 

Let R = Z[\V—5]. Determine whether or not 11 is an irreducible element of R and 
whether or not (11) is a prime ideal in R. 

Let 4 = Z[V-6]. Find a lattice basis for the product ideai AB, where A = (2,5) and 
B = (3,8). 


. Prove that A > A’ implies that AB D A’B. 
. Factor the principal ideal (14) into prime ideals explicitly in R = Z[{5], where 


5= V-5. 


. Let P be a prime ideal of an integral domain R, and assume that existence of factoriza- 


tions is true in R. Prove that if a € P then some irreducible factor of a is in P. 


The Relation Between Prime Ideals of R and Prime 
Integers 


Find lattice bases for the prime divisors of 2 and 3 in the ring of integers in (a) Q['V - 14] 

and (b) Q[V -23]. 

Let d = ~—14. For each of the following primes p, determine whether or not p splits or 

ramifies in R, and if so, determine a lattice basis for a prime ideal factor of (p): 

2s, 7; 11, 13: 

(a) Suppose that a prime integer p remains prime in R. Prove that R/(p) is then a field 
with p* elements. 

(b) Prove that if p splits in R, then R/(p) is isomorphic to the product ring Fp X Fp. 
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4. 


Sarna wm 


Let p be a prime which splits in R, say (p) = PP, and let a € P be any element which is 
not divisible by p. Prove that P is generated as an ideal by (p, a). 


- Prove Proposition (9.3b). 
. Ifd = 2 or 3 (modulo 4), then according to Proposition (9.3a) a prime integer p remains 


prime in the ring of integers of Q[Vd] if the polynomial x? — d is irreducible modulo p. 
(a) Prove the same thing when d = 1 (modulo 4) and p # 2. 
(b) What happens to p = 2 in this case? 


. Assume that d = 2 or 3 (modulo 4). Prove that a prime integer p ramifies in R if and 


only if p = 2 or p divides d. 


» State and prove an analogue of problem 7 when d is congruent 1 modulo 4. 
. Let p be an integer prime which ramifies in R, and say that (p) = P?. Find an explicit 


lattice basis for P. In which cases is P a principal ideal? 


. A prime integer might be of the form a? + b?d, with a,b € Z. Discuss carefully how 


this is related to the prime factorization of (p) in R. 


. Prove Proposition (9.1). 


Ideal Classes in Imaginary Quadratic Fields 


. Prove that the ideals A and A’ are similar if and only if there is a nonzero ideal C such 


that AC and A'C are principal ideals. 


. The estimate of Corollary (10.12) can be improved to |a |? < 2A(L)/V3, by studying 


Y 


lattice points in a circle rather than in an arbitrary centrally symmetric convex set. Work 

this out. 

Let R = Z[6], where 5? = -6. 

(a) Prove that the lattices P = (2,6) and Q = (3,8) are prime ideals of R. 

(b) Factor the principal ideal (6) into prime ideals explicitly in R. 

(c) Prove that the ideal classes of P and Q are equal. 

(d) The Minkowski bound for R is [44] = 3. Using this fact, determine the ideal class ot 
group of R. 


. In each case, determine the ideal class group and draw the possible shapes of the lattices. 


(a) d=-10 (b)d=-13 (c)d=-14 (I) d=-15 (&) d=-17 
(f) d = -21 


. Prove that the values of d listed in Theorem (7.7) have unique factorization. 
. Prove Lemma (10.13). 
. Derive Corollary (10.14) from Lemma (10.13). 


Verify Table (10.24). 


li. Real Quadratic Fields 


1 


2. 


Let R = Z[8], 5 = V2. Define a size function on R using the lattice embedding (11.2): 

a (a + bd) = a* — 2b’. Prove that this size function makes R into a Euclidean domain. 

Let R be the ring of integers in a real quadratic number field, with d = 2 or 3 (mod- 

ulo 4). According to (6.14), R has the form Z[x]/(x* — d). We can also consider the 

ring R' = R[x]/(x? — d), which contains R as a subring. 

(a) Show that the elements of R’ are in bijective correspondence with points of R? in 
such a way that the elements of R correspond to lattice points. 
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(b) Determine the group of units of R’. Show that the subset U' of R’ consisting of the 
points on the two hyperbolas xy = +1 forms a subgroup of the group of units. 

(c) Show that the group of units U of R is a discrete subgroup of U', and show that the 
subgroup Up of units which are in the first quadrant is an infinite cyclic group. 

(d) What are the possible structures of the group of units U? 


3. Let Up denote the group of units of R which are in the first quadrant in the embedding 
(11.2). Find a generator for Uy when (a) d = 3, (b) d= 5S. 

4. Prove that if d is a square >1 then the equation x* — y’d = | has no solution except 
a=+1,b=0. 

5. Draw a figure showing the hyperbolas and the units in a reasonable size range for d = 3. 

12. Some Diophantine Equations 

1. Determine the primes such that x* + S5y* = 2p has a solution. 

2. Express the assertion of Theorem (12.10) in terms of congruence modulo 20. 

3. Prove that if x? = —5 (modulo p) has a solution, then there is an integer point on one of 


the two ellipses x? + Sy? = p or 2x? + 2xy + 3y? = p. 


. Determine the conditions on the integers a, b, c such that the linear Diophantine equation 


ax + by = c has an integer solution, and if it does have one, find all the solutions. 


. Determine the primes p such that the equation x* + 2y* = p has an integer solution. 


6. Determine the primes p such that the equation x* + xy + y* = p has an integer solu- 
tion. 

7. Prove that if the congruence x* = -10 (modulo p) has a solution, then the equation 
x? + 10y? = p* has an integer solution. Generalize. 

8. Find all integer solutions of the equation x* + 2 = y’. 


. Solve the following Diophantine equations. 


(a) y27+10=x° (b) y?+1=x? (ce) y?+2=*%° 


Miscellaneous Problems 


- Prove that there are infinitely many primes congruent | modulo 4. 
. Prove that there are infinitely many primes congruent to —1 (modulo 6) by studying the 


factorization of the integer p,p2--: p, — 1, where p,,..., pr are the first r primes. 


. Prove that there are infinitely many primes congruent to —1 (modulo 4). 
- (a) Determine the prime ideals of the polynomial ring C[.x, y] in two variables. 


(b) Show that unique factorization of ideals does not hold in the ring C[x, y]. 


. Relate proper factorizations of elements in an integral domain to proper factorizations of 


principal ideals. Using this relation, state and prove unique factorization of ideals in a 
principal ideal domain. 


- Let R be a domain, and let / be an ideal which is a product of distinct maximal ideals in 


two ways, say / = P,-:- P, = Q,:+-Q,. Prove that the two factorizations are the same, 
except for the ordering of the terms. 


- Let R be a ring containing Z as a subring. Prove that if integers m,n are contained in a 


proper ideal of R, then they have a common integer factor > 1. 


Chapter 11 Exercises 449 


*8. (a) Let 6 be an element of the group R*/Z*. Use the Pigeonhole Principle [Appendix 
(1.6)] to prove that for every integer n there is an integer b <n such that 
|bO| < 1/bn. 

(b) Show that for every real number r and every € > 0, there is a fraction m/n such that 
|r — m/n| S €/n. 

(c) Extend this result to the complex numbers by showing that for every complex num- 
ber @ and every real number € > 0, there is an element of Z(i), say B = (a + bi)/n 
with a,b,n € Z, such that |a — B| < €/n. 

(d) Let € be a positive real number, and for each element B = (a + bi)/n of Q(i), 
a,b,n € Z, consider the disc of radius €/n about B. Prove that the interiors of these 
discs cover the complex plane. 

(e) Extend the method of Proposition (7.9) to prove the finiteness of the class number 
for any imaginary quadratic field. 


*9. (a) Let R be the ring of functions which are palynomials in cos ¢ and sin t, with real 
coefficients. Prove that R ~ R[x, y]/(x? + y? — 1). 
(b) Prove that R is not a unique factorization domain. 
*(c) Prove that C[x, y]/(x? + y? — 1) is a principal ideal domain and hence a unique 
factorization domain. 


*10. In the definition of a Euclidean domain, the size function o is assumed to have as range 
the set of nonnegative integers. We could generalize this by allowing the range to be 
some other ordered set. Consider the product ring R = C[x] x C[y]. Show that we can 
define a_ size function R—-— {O0}——>S, where S is the ordered set 
{0, 1,2,3,...;@,@ + 1,@ + 2,@ + 3,...}, so that the division algorithm holds. 

*11. Let y: C[x, y]—~C[t] be a homomorphism, defined say by x»~~ x(t), yr y(t). 
Prove that if x(t) and y(t) are not both constant, then ker ¢ is a nonzero principal ideal. 
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Modules 


Be wise! Generalize! 


Piccayune Sentinel 


1. THE DEFINITION OF A MODULE 


Let R be a commutative ring. An R-module V is an abelian group with law of com- 
position written +, together with a scalar multiplication RX V——>V, written 
r,u“~rv, which satisfies these axioms: 


CP) lo = v, 
(ii) (rs)ov = r(sv), 
(ili) (r + s)v = ro + sv, 
(iv) riot+o')=rvt+rv’, 


for allr,s € Rand v,v’ € V. Notice that these are precisely the axioms for a vec- 
tor space. An F-module is just an F-vector space, when F is a field. So modules are 
the natural generalizations of vector spaces to rings. But the fact that elements of a 
ring needn’t be invertible makes modules more complicated. 

The most obvious examples are the modules R” of R-vectors, that is, row or 
column vectors with entries in the ring. The laws of composition for R-vectors are 
the same as for vectors with entries in a field: 


ay b, a+b; ay ra 
ee : and) rl 2s b= 


an by Gan + bn an Tan 
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The modules thus defined are called free modules. But when R is not a field, it is no 
longer true that these are the only modules. There will be modules which are not 
isomorphic to any free module, though they are spanned by a finite set. 

Let us examine the concept of module in the case that R is the ring of integers 
Z. Any abelian group V, its law of composition written additively, can be made into 
a module over Z in exactly one way, by the rules 


nv=vt+--- + v = “n times v” 


and (-n)v = —(nv), for any positive integer n. These rules are forced on us by ax- 
ioms (1.1), starting with lv = v, and they do make V into a Z-module; in other 
words, the axioms (1.1) hold. This is intuitively very plausible. To make a formal 
proof, we would go back to Peano’s axioms. Conversely, any Z-module has the 
structure of an abelian group, given by forgetting about its scalar multiplication. 
Thus 


(1.2) abelian group and Z-module are equivalent concepts. 


We must use additive notation in the abelian group in order to make this correspon- 
dence seem natural. 

The ring of integers provides us with examples to show that modules over a 
ting R need not be free. No finite abelian group except the zero group is isomorphic 
to a free module Z”, because Z” is infinite if n > 0 and Z° = 0. 

The remainder of this section extends some of our basic terminology to mod- 
ules. A submodule of an R-module V is a nonempty subset which is closed under ad- 
dition and scalar multiplication. We have seen submodules in one case before, 
namely ideals. 


(1.3) Proposition. The submodules of the R-module R' are the ideals of R. 


Proof. By definition, an ideal is a subset of R which is closed under addition 
and under multiplication by elements of R. o 


The definition of homomorphism of R-modules copies that of linear transforma- 
tion of vector spaces. A homomorphism gy: V-—— W of R-modules is a map which 
is compatible with the laws of composition 


(1.4) g(v + v’) = g(v) + glo’). and g(rv) = rev), 


for all v, v' € Vand r € R. A bijective homomorphism is called an isomorphism. 
The kernel of a homomorphism yg: V— > W is a submodule of V, and the image of 
¢g is a submodule of W. 

The proof given for vector spaces [Chapter 4 (2.1)] shows that every homo- 
morphism ¢: R™—— > R” of free modules is left multiplication by a matrix whose 
entries are in R. 
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We also need to extend the concept of quotient group to modules. Let R be a 
ring, and let W be a submodule of an R-module V. The quotient V/W is the additive 
group of cosets [Chapter 2 (9.5)] 0 = v + W. It is made into an R-module by the 
rule 


(1.5) rd = To. 


We have made such constructions several times before. The facts we will need are 
collected together below. 


(1.6) Proposition. 


(a) The rule (1.5) is well-defined, and it makes V = V/W into an R-module. 

(b) The canonical map 7: V-——>V sending v~~~ 1D is a surjective homomor- 
phism of R-modules, and its kernel is W. 

(c) Mapping property: Let f. V——> V' be a homomorphism of R-modules whose 
kernel contains W. There is a unique homomorphism: f: V—=V' such that 
f =f7. 

(d) First Isomorphism Theorem: If ker f = W, then f is an isomorphism from V to 
the image of f. 

(e) Correspondence Theorem: There is a bijective correspondence between sub- 
modules 5 of V and submodules S of V which contain W, defined by 
S = mw '(S) and S = 7(S). If S and S are corresponding modules, then V/S is 
isomorphic to V/S. 


We already know the analogous facts for groups and normal subgroups. All that re- 
mains to be checked in each part is that scalar multiplication is well-defined, satisfies 
the axioms for a module, and is compatible with the maps. These verifications fol- 
low the pattern set previously. o 


2. MATRICES, FREE MODULES, AND BASES 


Matrices with entries in a ring can be manipulated in the same way as matrices with 
entries in a field. That is, the operations of matrix addition and multiplication are 
defined as in Chapter 1, and they satisfy similar rules. A matrix with entries in a 
ring R is often called an R-matrix. 

Let us ask which R-matrices are invertible. The determinant of an n Xn R- 
matrix A = (ay) can be computed by any of the old rules. It is convenient to use the 
complete expansion [Chapter | (4.12)], because it exhibits the determinant as a 
polynomial in the n? matrix entries. So we write 


(2.1) det A = >) + aipay= Gaon, 
Pp 
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the sum being over all permutations of the set {1,...,n}, and the symbol + standing 
for the sign of the permutation. Evaluating this formula on an R-matrix, we obtain 
an element of R. The usual rules for determinant apply, in particular 


det AB = (det A)(det B). 


We have proved this rule when the matrix entries are in a field [Chapter 1 (3.16)], 
and we will discuss the reason that such formulas carry over to rings in the next sec- 
tion. Let us assume for now that they do carry over. 

If A has a multiplicative inverse A~' with entries in R, then 


(det A)(det A~') = det 7 = 1. 


This shows that the determinant of an invertible R-matrix is a unit of the ring. Con- 
versely, let A be an R-matrix whose determinant 6 is a unit. Then we can find its in- 
verse by Cramer’s Rule: 6/ = A(adj A), where the adjoint matrix is calculated from 
A by taking determinants of minors [Chapter | (5.4)]. This rule also holds in any 
ring. So if 5 is a unit, we can solve for A~' in R as 


A! = 6 ‘(adj A). 


(2.2) Corollary. The invertible n X n matrices A with entries in R are those ma- 
trices whose determinant is a unit. They form a group 


GL,(R) = {invertible n X n R-matrices}, 


called the general linear group over R. o 


The fact that the determinant of an invertible matrix must be a unit is a strong 
condition on the matrix when R has few units. For instance, if R is the ring of in- 
tegers, the determinant must be +1. Most integer matrices are invertible real ma- 
trices, so they are in GL,(R). But unless the determinant +1, the entries of the in- 
verse matrix won’t be integers, so the inverses will not be in GL,(Z). Nevertheless, 
there are always reasonably many invertible matrices if n > 1, because the elemen- 
tary matrices 


I + aej = oe > tee, a eS, 
] 


have determinant |. These matrices generate a good-sized group. The other elemen- 
tary matrices, the transposition matrices and the matrices 


] 
“U. , u aunitinR, 


are also invertible. 
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We now return to the discussion of modules over a ring R. The concepts of ba- 
sis and independence (Chapter 3, Section 3) can be carried over from vector spaces 


to modules without change: An ordered set (v;,..., vx) of elements of a module V is 
said to generate (or span) V if every v € V is a linear combination: 
Gs) v=riv tee t+ reve, with rr; ER. 


In that case the elements v; are called generators. A module V is said to be finitely 
generated if there exists a finite set of generators. Most of the modules we study will 
be finitely generated. A Z-module V is finitely generated if and only if it is a finitely 
generated abelian group in the sense of Chapter 6, Section 8. 

We saw in Section 1 that modules needn’t be isomorphic to any of the modules 
R*. However, a given module may happen to be, and if so, it is called a free module 
too. Thus a finitely generated module V is free if there is an isomorphism 


gp: R"—=>V. 


For instance, lattices in R* are free Z-modules, whereas finite, nonzero abelian 
groups are not free. A free Z-module is also called a free abelian group. Free mod- 
ules form an important and natural class, and we will study them first. We will study 
general modules beginning in Section 5. 

Following the definitions for vector spaces, we call a set of elements 
(v1,..., Un) of a module V independent if no nontrivial linear combination is zero, 
that is, if the following condition holds: 


(2.4) Ifriv, +o + rata = 0, withr; ER, thenr; = 0 fori = 1,...,n. 


The set is a basis if it is both independent and a generating set. The standard basis 
E = (e,..., ex) is a basis of R*. Exactly as with vector spaces, (v,,..., vx) is a basis 
if every v € V is a linear combination (2.3) in a unique way. 

We may also speak of linear combinations and linear independence of infinite 
sets, using the terminology of Chapter 3, Section 5. 

Let us denote the ordered set (v,,..., Un) by B, as in Chapter 3, Section 3. Then 
multiplication by B, 


x} 
BX = (v),.2, 0p l= Da ae 
- 
defines a homomorphism of modules 
(255) pe: R"—> V. 
This homomorphism is surjective if and only if the set (v,,..., On) generates V, and 


injective if and only if it is independent. Thus it is bijective if and only if B is a basis 
of V, in which case V is a free module. So a module V has a basis if and only if it is 
free. Most modules have no bases. 
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Computation with bases of free R-modules can be done in much the same way 
as with bases of vector spaces, using matrices with entries in R. In particular, we 
can speak of the cvordinate vector of an element v € V, with respect to a basis 
B = (v),..., Un). It is the unique column vector X € R” such that 


0 = BX = v1X%1 + ot + OnXn. 


If two bases B = (v),..., Un) and B’ = (v{,..., v;) for the same free module V 
are given, then the matrix of change of basis is obtained as in Chapter 3, Section 4 
by writing the elements v; of the first basis as linear combinations of the second ba- 
sis: B = B'P, or 


t 
(2.6) vj = > Ui: Py. 


As with vector spaces, any two bases of the same free module over a nonzero 
ring have the same cardinality, provided that R is not the zero ring. Thus n = r in 
the above bases. This can be proved by considering the inverse matrix Q = (qj) 
which is obtained by writing B’ in terms of B: B’ = BQ. Then 


B = B’'P = BOP. 


Since B is a basis, there is only one way to write v; as a linear combination of 
(v,...,Un), and that is vj = 1lvj;, or B = BJ. Therefore QP = J, and similarly 
PQ = I: The matrix of change of basis is an invertible R-matrix. 

Now P is anr X n matrix, and Q isan X r matrix. Suppose that r > n. Then 
we make P and Q square by adding zeros: 


This does not change the product PQ. But the determinants of these square matrices 
are zero, so they are not invertible, because R # 0. This shows that r = n, as 
claimed. 

It is a startling fact that there exist noncommutative rings R for which the mod- 
ules R” for n = 1,2.3,... are all isomorphic (see miscellaneous exercise 6). Deter- 
minants do not work well unless the matrix entries commute. 

Unfortunately, most concepts relating to vector spaces have different names 
when used for modules over rings, and it is too late to change them. The number of 
elements of a basis for a free module V is called the rank of V, instead of the dimen- 
sion. 

As we have already remarked, every homomorphism ¢: R"——> R”™ between 
column vectors is left multiplication by a matrix A. If g; V—— W is a homomor- 
phism of free R-modules with bases B = (v1,..., Un) and C = (w;, ..., Wm) Tespec- 
tively, then the matrix of the homomorphism is defined to be A = (a,j), where 


als g(vj) = 2 Widij 
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as before [Chapter 4 (2.3)]. A change of the bases B, C by invertible R-matrices P, Q 
changes the matrix of g to A’ = QAP" [Chapter 4 (2.7)]. 


3. THE PRINCIPLE OF PERMANENCE OF IDENTITIES 


In this section, we address the following question: Why do the properties of ma- 
trices with entries in a field continue to hold when the entries are in an arbitrary 
ring? Briefly, the reason is that they are identities, which means that they hold when 
the matrix entries are replaced by variables. To be more precise, assume we want to 
prove some identity such as the multiplicative property of the determinant, 
(det A)(det B) = det(AB), or Cramer’s Rule. Suppose that we have already checked 
the identity for matrices with complex entries. We don’t want to do the work again, 
and anyhow we may have used special properties of C, such as the field axioms, the 
fact that every complex polynomial has a root, or the fact that C has characteristic 
zero, to check the identity there. We did use special properties to prove the identi- 
ties mentioned, so the proofs we gave will not work for rings. We are now going to 
show how to deduce such identities for all rings from the same identities for the 
complex numbers. 

The principle is very general, but in order to focus attention, let us concentrate 
on the identity (det A)(det B) = det(AB). We begin by replacing the matrix entries 
with variables. So we consider the same identity 


(det X)(det Y) = det(xy), 


where X and Y denote n X n matrices with variable entries. Then we can substitute 
elements in any ring R for these variables. Formally, the substitution is defined in 
terms of the ring of integer polynomials Z[{x;;}, {yxe}] in 2n* variable matrix entries. 
There is a unique homomorphism from the ring of integers to any ring R [Chapter 
10 (3.9)]. Given matrices A = (aj), B = (bx) with entries in R, there is a homomor- 
phism 


(3.1) Z {xi}, {yee}] — R, 


the substitution homomorphism, which sends x; ~~~» aj and yxe~~~ bye [Chapter 10 
(3.4)]. Our variable matrices have entries in the polynomial ring, and it is natural to 
say that the homomorphism sends X ~~ A and Y~~~ B, meaning that the entries of 
X = (xj) are mapped to the entries of A = (aj) and so on, by the map. 

The general principle we have in mind is this: Suppose we want to prove an 
identity, all of whose terms are polynomials with integer coefficients in the matrix 
entries. Then the terms are compatible with ring homomorphisms: For example, if a 
homomorphism yg: R——R' sends A»~~A’ and B~~B’, then it sends 
det A~~~» det A’. To see this, note that the complete expansion of the determinant is 


det A = S + ipa) *** Anpiny, 
Pp 
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the summation being over all permutations p. Since ¢ is a homomorphism, 


gy (det A) = >, = P (Aipiry *** Anpiny) = »» = Aipay’ *** Anpiny’ = det A’. 
Pp 


Obviously, this is a general principle. Consequently, if our identity holds for the 
R-matrices A, 8, then it also holds for the R'-matrices A’, B’. 

Now for every pair of matrices A,B, we have the homomorphism (3.1) which 
sends X~~~ A and Y~w B. We substitute Z [{x;}, {yi}] for R and R for R’ in the 
principle just described. We conclude that if the identity holds for the variable ma- 
trices X,Y in Z[{x,}, {yy}], then it holds for every pair of matrices in any ring R: 


(3.2) To prove our identity in general, we need only prove it 
for the variable matrices X,Y in the ring Z[{xij}, { yi]. 


To prove it for variable matrices, we consider the ring of integers as a subring 
of the field of complex numbers, noting the inclusion of polynomial rings 


LZ {xis}, { vit] C Cltxu}, {yi} ]. 


We may as well check our identity in the bigger ring. Now by hypothesis, our iden- 
tity is equivalent to the equality of certain polynomials in the variables {xj}, {yy},... . 
Let us write the identity as f(xi, ye) = 0. The symbol f may stand for several poly- 
nomials. 

We now consider the polynomial function corresponding to the polynomial 
f(xy, yu), call it Ff (xij, yu). If the identity has been proved for all complex matrices, 
then it follows that f(xi, ym) is the zero function. We apply the fact [Chapter 10 
(3.8)] that a polynomial is determined by the function it defines to conclude that 
f(xy, yj) = 0, and we are done. 

It is possible to formalize the above discussion and to prove a precise theorem 
concerning the validity of identities in an arbitrary ring. However, even mathemati- 
cians occasionally feel that it isn’t worthwhile making a precise formulation—that it 
is easier to consider each case as it comes along. This is one of those occasions. 


4, DIAGONALIZATION OF INTEGER MATRICES 


In this section we discuss simplification of an m X n integer matrix A = (aj) by a 
succession of elementary operations. We will apply this procedure later to classify 
abelian groups. The same method will work for matrices with entries in a Euclidean 
domain and, with some modification, for matrices with entries in a principal ideal 
domain. 

The best results are obtained if we allow both row and column operations to- 
gether. So we allow these operations: 
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(4.1) 
(i) add an integer multiple of one row to another, or add an integer multiple of 
one column to another; 
(ii) interchange two rows or two columns; 
(iii) multiply a row or a column by a unit. 


Of course, the units in Z are +1. Any such operation can be made by multiplying A 
on the left or right by a suitable elementary integer matrix. The result of a sequence 
of these operations will have the form 


(4.2) A’ = QAP"'," 


where Q © GL,,(Z) and P™' € GL,(Z) are products of elementary integer matrices. 
Needless to say, we could drop the inverse symbol from P. We put it there because 
we will want to interpret the operation as a change of basis. 

Over a field, any matrix can be brought into the block form 


‘ae | 


by such operations [Chapter 4 (2.9)]. We can not hope for such a result when work- 
ing with integers. We can’t even do it for | X 1 matrices. But we can diagonalize: 


(4.3) Theorem. Let A be an m X n integer matrix. There exist products Q, P of el- 
ementary integer matrices as above, so that A’ = QAP is diagonal: 


d 
7 
0 
where the diagonal entries d; are nonnegative and where each diagonal entry divides 
the next: d;|d2, d2|ds,... . 


Proof. The strategy is to perform a sequence of operations so as to end up with 
a matrix 


(4.4) 
0 

in which d, divides every entry of B. When this is done, we work on B. The process 

is based on repeated division with remainder. We will describe a systematic method, 


though using this method is usually not the quickest way to proceed. 
We may assume A # 0. 
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Step 1: By permuting rows and columns, move a nonzero entry with smallest abso- 
lute value to the upper left corner. Multiply the first row by —1 if necessary, so that 
this upper left entry a,, becomes positive. 

We now try to clear out the first row and column. Whenever an operation pro- 
duces a nonzero entry in the matrix whose absolute value is smaller than a,,, we go 
back to Step | and start the whole process over. This is likely to spoil the work we 
have done to clear out matrix entries. However, progress is being made because the 
size of a: is reduced every time. We will not have to return to Step 1 infinitely 
often. 


Step 2: Choose a nonzero entry aj, in the first column, with i > 1, and divide by 
itis 


Gi) = CiG) ar IP. 


where 0 = r < aj,. Subtract q times (row 1) from (row i). This changes aj, to r. 

If r # 0, we go back to Step 1. If r = 0, we have produced a zero in the first 
column. Finitely many repetitions of Steps 1 and 2 result in a matrix in which aj; = 
0 for all i > 1. Similarly, we may use the analogue of Step 2 for column operations 
to clear out the first row, eventually ending up with a matrix in which the only 
nonzero entry in the first row and column is a,, as required by (4.3). However, aj, 
may not yet divide every entry of the matrix B (4.4). 


Step 3: Assume that a; is the only nonzero entry in the first row and column, but 
that some entry b of B is not divisible by a;;. Add the column of A which contains b 
to column 1. This produces an entry 5 in the first column. 

We go back to Step 2. Division with remainder will now produce a smaller ma- 
trix entry, sending us back to Step 1. A finite sequence of these steps will produce a 
matrix of the form (4.4), allowing us to proceed by induction. o 


(4.5) Example. We do not follow the systematic method: 
9 =] column | j| =) | column 1 | row 1 ‘ 
A= —_— —> —>- =A’. 
| 2 oper 352 oper 3 S| oper 5 
hat a viet | 
25 iE | ial [ il | f 2\" 


Note that the key ingredient in this proof is the division algorithm. The same 
proof will work when Z is replaced by any Euclidean domain. 


Here 


(4.6) Theorem. Let R be a Euclidean domain, for instance a polynomial ring F [rt] 
in one variable over a field. Let A be an m X n matrix with entries in R. There are 
products Q, P of elementary R-matrices such that A’ = QAP ' is diagonal and such 
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that each diagonal entry of A’ divides the next: d:|d.|d3|.... If R = F[t], we can 
normalize by requiring the polynomials d; to be monic. o 


(4.7) Example. Diagonalization of a matrix of polynomials: 


i tease 2 row Fees zl sia law = column 


(1)? t?—34+-2 | oper (t—1)? 0 opr |(t-1)* 0 oper 


ke 7 caltinin ie 0 j=) |= 
(t-1? 0 oper [(t—1)? (t—1)?(t—2)] oper (t—1)*(t—2) , 

In both examples, we ended up with | in the upper left corner. This isn’t sur- 
prising. The matrix entries will often have greatest common divisor 1. 

The diagonalization of integer matrices can be used to describe homomor- 
phisms between free abelian groups. As we have already remarked (2.8), a homo- 
morphism g: V——> W of free abelian groups is described by a matrix, once bases 
for V and W are chosen. A change of bases in V, W by invertible integer matrices 
P,Q changes A to A’ = QAP™'. So we have proved the following theorem: 


(4.8) Theorem. Let yg: V——>W be a homomorphism of free abelian groups. 
There exist bases of V and W such that the matrix of the homomorphism has the di- 
agonal form (4.3). 5 


In the rest of this section, we will investigate the meaning of this theorem for two 
auxiliary groups associated to a homomorphism: its kernel and its image. 

Let gy: Z"—— Z” be left multiplication by the m X n integer matrix A. The 
kernel of ¢ is the subgroup of Z” of integer solutions of the system of linear equa- 
tions 


(4.9) AX = 0. 


These solutions can be read off immediately when the matrix is diagonal: In order 
for X to solve the diagonal system dx, = 0,...,dnXn = 0, we must have x; = O un- 
less dj = 0, and if d; = 0, then x; can be arbitrary. 

To solve (4.9) in general, we may diagonalize A, say to A’ = QAP™', where 
Q,P are products of elementary integer matrices. We make the change of variable 
X' = PX and solve the diagonal system 


A'X' = QAP 'x’ = 0. 


Since @ is invertible, the system of equations QAX = 0 has the same solutions as the 
system AX = 0. So the solutions of the original system are X = P™'X’. 

Next, let us examine the image of g: Z’ > Z”, the map defined by multipli- 
cation by the integer matrix A as before. We can describe this image as the set of 
vectors B € Z” such that the system of integer equations AX = B has an integer so- 
lution. We will often denote this image by AZ”. Multiplication by A sends the basis 
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vectors €;,...,@n © Z” to the columns 
ai Gin 
A= fb oe ae = 
(4.10) 3 
Ami Amn 


of A, so the image is the set of integer linear combinations of these columns. In other 
words, the columns generate the image. 

We can turn this description around, starting with an arbitrary subgroup S of 
the free abelian group Z” which is given to us explicitly by a set of generators 
Ai,...,An © Z™. Let A be the matrix whose columns are A;. Then S is the image of 
left multiplication by A. This interpretation of S as the image of a homomorphism 
tells us the meaning of left and right multiplication by invertible integer matrices Q 
and P~': Left multiplication by Q corresponds to a change of basis in the module Z”, 
the range of the map. Its effect is to multiply each of the generators A; by Q. On the 
other hand, right multiplication by P~' represents a change of basis in the domain 
Z". This changes the generating set of §. For example, adding r times column 1 to 
column 2 changes Az to A’ = Az + rA, and leaves the other generators unchanged. 
Combining these observations with diagonalization results in the following theorem: 


(4.11) Theorem. Let S be a subgroup of a free abelian group W of rank m. There 
is a basis (w,,.... Wn) of W and a basis (u1,..., Un) of S with the following properties: 
(i) n =m, (ii) for each j Sn there is a positive integer d; such that uj = djw;, and 
(iii) d,|d|d3.... 


(4.12) Corollary. Every subgroup of a free abelian group of rank m is free, and 
its rank is at most m. o 


Proof of Theorem (4.11). Roughly speaking, we need only choose a basis 
B = (w1,...,Wm) for W and a set of generators (u;,...u,) for S, to obtain an m Xn 
matrix A which represents S as above. The diagonalization theorem gives us a diago- 
nal matrix A’ = QAP' representing S with respect to a new basis B’ = (w,’,..., wp’) 
and new generating set (u;',..., Un’). Then uj’ = djw;’. We drop the primes to obtain 
the basis and generating set required. This completes the proof except for three 
points. 

First, we may have n > m, that is, there may be more columns than rows. But 
if so, then since A’ is diagonal, its jth column is zero for each j > m; hence the cor- 
responding generator u; is zero too. The zero element is useless as a generator, so we 
throw it out. For the same reason, we may throw out a generator u; whenever 
d; = 0. After we do this, all d; will be positive, and we will have n = m. 

Notice that if S is the zero subgroup, we will end up throwing out all the gen- 
erators. As with vector spaces, we must adopt the convention that the ‘empty set 
generates the zero module, or else make a special mention of this exceptional case in 
the statement of the theorem. 
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Next, we verify that if the basis and generating set are chosen so that d, > 0 


and n < m, then (u),..., un) is a basis of S. Since it generates S, what has to be 
proved is that (u,...,un) is independent. We rewrite a linear relation 
ri; tort + ralln = 0 in the form ridjw, + ++ + radnwn = 0. Since (wi,..., Wn) 


is a basis, rid; = 0 for each i, and since d; > 0, r; = 0. 

The final point is more serious: We need a finite set of generators of S to get 
started. How do we know that there is such a set? It is a fact that every subgroup of 
a finitely generated abelian group is itself finitely generated. We will prove this in 
Section 5. For the moment, the theorem is proved only with the additional hypothe- 
sis that S is finitely generated. The hypothesis that W is finitely generated can not be 
removed. o 


Theorem (4.11) is quite explicit. Let S be the subgroup of Z” generated by the 
columns of a matrix A, and suppose that A’ = QAP™' is diagonal. To display S in the 
form asserted in the theorem, we rewrite this equation in the form 


(4.13) Q-'A' = AP", 


and we interpret it as follows: The columns of the matrix AP~' form our new set of 
generators for S. Since the matrix A’ is diagonal, (4.13) tells us that the new genera- 
tors are multiples of the columns of Q~'. We change the basis of Z” from the stan- 
dard basis to the basis made up of the columns of @~'. The matrix of this change of 
basis is Q@ [see Chapter 3 (4.21)]. Then the new generators are multiples of the new 
basis elements. 

For instance, let S be the lattice in R* generated by the two columns of the ma- 
trix A of Example (4.5): Then 


wo oe EB dG ak Jee 


The new basis of Z? is (w,’, w2') = (CS and the new generators of S are 
(u;', uz’) = (uy : u2)P! = (wi, Sw’). : 

Theorem (4.3) is striking when it is used to describe the relative position of a 
sublattice S in a lattice L. To illustrate this, it will be enough to consider plane lat- 
tices. The theorem asserts that there are bases (v1, v2) and (w;, w2) of L and S such 
that the coordinate vectors of w; with respect to the basis (v,, v2) are diagonal. Let 
us refer the lattice L back to Z* C R’ by means of the basis (v,, v2). Then the equa- 
tions w; = djv; show that S looks like this figure, in which we have taken d, = 2 
and d, = 4: 
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* : * 
* * 
(4.15) Figure. = *, matrix ki ih 
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Notice the fact, which we have asserted before {Chapter 11 (10.10)], that the index 
[L:S] is the ratio of the areas of the parallelograms spanned by bases. This is evident 


when the bases are in such a relative position. 


In practice, when the lattices L and S are given to us in R? at the start, the 
change of basis required to get such “commensurable” bases of L and S leads to 


rather long and thin parallelograms, as is shown below for Example (4.14). 


° ° ° * e ° ° ° ® e e 
e r * . e e ° * ° ° * * 
C4 . e e e 5d . ° * ° ° * 
° . ° * e ° ° e ° ° e ° 
° a ° ° ° ° * ° ° * e e 
° ° e x e e * e 
* ° . e * e e * 
e * . ° ° ° * ) . * . . O. ) 


(4.16) Figure. Diagonalization, applied to a sublattice. 
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5. GENERATORS AND RELATIONS FOR MODULES 


{n this section we turn our attention to modules which are not free. We will show 
how to describe a large class of modules by means of matrices called presentation 
matrices. We will then apply the diagonalization procedure to these matrices to the 
study of abelian groups. 

As an example to keep in mind, we may consider an abelian group or 
Z-module V which is generated by three elements (v1, v2, v3). We suppose that these 
generators are subject to the relations 
(5.1) 30) + 202 ae 35 0 

80; + 42 mi 203 = 0) 
Tv, + 6v2 + 203 = 0 
9v, + 6v2 + v3 = O. 


The information describing this module is summed up in the matrix 


30"; 9 
A=|2 4 6 6], 
Ce | ie | 


whose columns are the coefficients of the relations (5.1): 
(v1, D2, 03)A = (6,0,0,0). 


As usual, scalars appear on the right side in this matrix product. It is this method of 
describing a module which we plan to formalize. 
If (v,,,..., Om) are elements of an R-module V, equations of the form 


(Saas a0, + -*- + Gmtm = 0, a E R, 


are called relations among the elements. Of course, when we refer to (5.3) as a rela- 
tion, we mean that the formai expression is a relation: If we evaluate it in V, we get 
0 = 0. Since the relation is determined by the R-vector ({a1,...,@m)', we will refer to 
this vector as a relation vector, meaning that (5.3) is true in V. By a complete set of 
relations we mean a set of relation vectors such that every relation vector is a linear 
combination of this set. It is clear that a matrix such as (5.2) will not describe the 
module V completely, unless its columns form a complete set of relations. 

The concept of a complete set of relations can be confusing. It becomes much 
clearer when we work with homomorphisms of free modules rather than directly 
with the relations or the relation vectors. Let an m X n matrix A with entries in a ring 
R be given. As we know, left multiplication by this matrix is a homomorphism of 
R-modules 


(5.4) gp: R’-—> R”. 
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In addition to the kernel and image, which we described in the last section when 
R = Z, there is another important auxiliary module associated with a homomor- 
phism ¢: W——> W' of R-modules, called its cokernel. The cokernei of @ is defined 
to be the quotient module 


(5.5) W'/(im g). 


If we denote the image of left multiplication by A by AR”, the cokerne} of (5.4) 
is R”/AR”. This cokernel is said to be presented by the matvix A More generally. 
we will call any isomorphism 


(5.6) ao: R™/AR"—> V 


a presentation of a module V, and we say that the matrix A is a presentation matrix 
for V if there is such an isomorphism. 

For example, the cyclic group Z/(5) is presented as a Z-module by the | x |} 
integer matrix [5]. As another example, let V be the Z-module presented by the ma- 


ae he I . ; : 
trix . The columns of this matrix are the relation vectors, so V is generated 


I 2 
by two elements r,. r2 with the relations 2v, + v2 = ~t, + 2p. = 0. We may solve 
the first relation, obtaining v: = —2r,. This allows us to eliminate the second gener- 


ator. Substitution into the second relation gives -5v, = 0. So V can also be gener- 
ated by a single generator v,. with the single relation 5c, = 0. This shows that V is 
isomorphic to Z/(5). This 2 * 2 inatrix also presents the cyclic group Z/(5). 

We will now describe a theoretical method of finding a presentation of a given 
module V. To carry out this method in practice, the mcdule would have to be given 
in a very explicit way. Our first step is to choose a set of generators (v1,.... Um). So V 
must be finitely generated for us to get started. These generators provide us with a 
surjective homomorphism 


(5.7) fe Vv, 


sending the column vector X = (%1,...,%m) tO UiX, + +++ + Omtm. The elements of 
the kernel of f are the relation vectors. Let us denote this kernel by W. By the First 
isomorphism Theorem, V is isomorphic to R”/W. 

We repeat the procedure, choosing a set of generators (1v;...., Wn) for W, and 
we use these generators to define a surjective homomorphism 


(5.8) g: R"—> W 


as before. Since W is a submodule of R”, composition of the homomorphism ¢g with 
the inclusion W C R”™ gives us a homomorphism 


(5.9) oe: R"—> R". 


This homomorphism is left multiplication by a matrix A. By construction, W is the 
image of ¢, which is 4R”, so R™/AR" = R™/W ~ V. Therefore, A is a presentation 
matrix for V. 
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The columns of the matrix A are our chosen generators for the module W of re- 
lations: 
ai Ain 


Ami Amn 
Since they — W, these columns form a complete set of relations among the 
generators (v1,..., Um) of the module V. Since the columns are relation vectors, 


(5.10) (v1,...,Um)A = 0. 


Thus the presentation matrix A for a module V is determined by 


(5.11) 


(i) a set of generators for V, and 
(ii) a complete set of relations among these generators. 


We have let one point slip by in this description. In order to have a finite set of 
generators for the module of relations W, this module must be finitely generated. 
This does not look like a satisfactory hypothesis, because the relationship of our 
original module V with W is unclear. We don’t mind assuming that V is finitely gen- 
erated, but it isn’t good to impose hypotheses on a module which arises in the course 
of some auxiliary construction. We will need to examine this point more closely [see 
(5.16)]. But except for this point, we can now speak of generators and relations for a 
finitely generated R-module V. 

Since the presentation matrix depends on the choices (5.11), many matrices 
present the same module, or isomorphic modules. Here are some rules for manipu- 
lating a matrix A without changing the isomorphism class of the module it presents: 


(5.12) Proposition. Let A be an m Xn presentation matrix for a module Y. The 
following matrices A’ present the same module V: 


(i) A’! = QAP™', where @ € GL,,(R) and P € GL,(R); 
(11) A’ is obtained by deleting a column of zeros; 


(iii) the jth column of A is e;, and A’ is obtained from A by deleting the ith row and 
jth column. 


Proof. 


(i) The module R”/AR" is isomorphic to V. Since the change of A to QAP™' corre- 
sponds to a change of basis in R” and R”, the iia aa class of the quotient 
module does not change. 
(ii) A column of zeros corresponds to the trivial relation, which can be omitted. 
(iii) Suppose that the jth column of the matrix A is e,. The corresponding relation is 
v, = 0. So it holds in the module V, and therefore v, can be left out of the gen- 
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erating set (v,..... Um). Doing so changes the matrix A by deleting the ith row 
and jth column. o 


It may be possible to simplify a matrix quite a lot by these rules. Kor instance, 
our original example of the integer matrix (5.2) reduces as follows: 


8 7 9 Lal > ae 5 
4 6 61|——!|10 02 4 a +] : ao 
a 2eal i 2 oe 

— [-4 -8])——>[-4 0]—— [4]. 


Thus A presents the abelian group Z/(4). 

By definition, an m Xn matrix presents a module by means of m generators 
and n relations. But as we see from this example, the number of generators and the 
number of relations depend on choices. They are not uniquely determined by the 
module. 


Consider two more examples: The 2 x | matrix c presents an abelian group 


V by means of two generators (v;, v2) and one relation 4v,; = 0. We can not simplify 
this matrix. The group which it presents is isomorphic to the product group 
Z/(4) x Z. On the other hand, the matrix [4 0] presents a group with one genera- 
tor v; and two relations, the second of which is the trivial relation. This group is 
2/(4). 

We will now discuss the problem of finite generation of the module of rela- 
tions. For modules over a nasty ring, this module needn't be finitely generated, even 
though V is. Fortunately this problem does not occur with the rings we have been 
studying, as we will now show. 


(5.13) Proposition. The following conditions on an R-module V are equivalent: 


(i) Every submodule W of V is finitely generated; 
(ii) ascending chain condition: There is no infinite strictly increasing chain 
W, < W2 < ... of submodules of V. 


Proof. Assume that V satisfies the ascending chain condition, and let W be a sub- 
module of V. We select a set w;, w2,..., wx of generators of W in the following way: 
If W = 0, then W is generated by the empty set. If not, we start with a nonzero ele- 
ment w,; € W. To continue, assume that w;,..., w; have been chosen, and let W, be 
the submodule generated by these elements. If W; is a proper submodule of W, let 
wj+, be an element of W which is not contained in W;. Then W, < W, < .... Since 
V satisfies the ascending chain condition, this chain of submodules can not be con- 
tinued indefinitely. Therefore some W; is equal to W. Then (w,,..., wx) generates W. 
The converse follows the proof of Theorem (2.10) of Chapter 11. Assume that every 
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submodule of V is finitely generated, and let Wi C W, C ... be an infinite increasing 
chain of submodules of V. Let U denote the union of these submodules. Then U is a 
submodule [see Chapter 1] (2.11)]; hence it is finitely generated. Let u,...u, be 
generators for U. Each u, is in one of the modules W;, and since the chain is increas- 
ing, there is an i such that all of the generators are in W;. Then the module U they 
generate is also in W,;, and we have U C W; C Wi+, C U. This shows that U = 
W; = W,+, and that the chain is not strictly increasing. o 


(5.14) Lemma. 


(a) Let ¢: Y—— W be a homomorphism of R-modules. If the kernel and the im- 
age of ¢ are finitely generated modules, so is V. If V is finitely generated and if 
wy is surjective, then W is finitely generated. More preciscly, suppose that 
@....2 Un) generates V and that @ is surjective. Then (e(ti),..., @(tn)) gener- 


(b) Let W be a submodule of an &-module V. If both W and V/W are finitely gen- 
erated, so is V. If V is finitely generated, so is V/W. 


Proof. For the first assertion of (a), we follow the proof of the dimension for- 
mula for linear transformations [Chapter 4 (1.5)], choosing a set of generators 
(u,... uk) for ker gy and a set of generators (1v, ,..., Wm) for im g~. We also choose ele- 
ments vu; © V such that ¢(v;) = w;. Then we claim that the set (u1,..., ux; U1,..., Um) 
generates V. Let tv © V be arbitrary. Then y(v) is a linear combination of 
(wi,..-, Wm), Say P(v) = aim, +--+ + GnWm. Let v’ = ayo, + --+ + Gmtm. Then 
p(v’) = plu). Hence t — v’ € kerg, so t — v’ is a linear combination of 
(u1,..., Uk), Say v — v’ = byu, + --- + Duy. Therefore v = ayv; + -*- + AmUm + 
byu, + --++ + byux. This shows that the set (u,..., Us: U1,...,Um) generates V, as 
required. The proof of the second assertion of (a) is easy. Part (b) follows from 
part (a) by a consideration of the canonical homomorphism 7: V—>V/W. o 


(5.15) Definition. A ring R is called noetherian if every ideal of R is finitely 
generated. 


Principal ideal domains are obviously noetherian, so the rings Z, Z[i], and F[x] 
(F a field) are noetherian. 


(5.16) Corellary. Let R be a noetherian ring. Every proper ideal / of R is con- 
tained in a maximal ideal. 


Proof. If J is wot maximal itself, then it is properly contained in a proper ideal 
i, and if J, is not maximal, it is properly contained in a proper ideal /,, and so on. 
By the ascending chain conditior (5.13), the chain J = 1, < 1 < J... must be 
finite. Therefore J; is maximal for some k. o 
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The relevance of the notion of noctherian ring to our problem is shown by the 
following proposition: 


(5.17) Proposition. Let V be a finitely generated module over a noetherian ring R. 
Then every submodule of V is finitely generated. 


Proof. \t suffices to prove the proposition in the case that V = R”™. For as- 
sume that we have proved that the submodules of R” are finitely generated, for all 
m. Let V be a finitely generated R-module. Then there is a surjective map 
g: R™—-> V. Given a submodule S of V, let L = g™'(S). Then L is a submodule of 
the module R”, and hence L is finitely generated. Also, the map L—— S is surjec- 
tive. Hence S is finitely generated (5.14). 

To prove the proposition when V = R”™, we use induction on m. A submodule 
of R is the same as an ideal of R (1.3). Thus the noetherian hypothesis on R tells us 
that the proposition holds for V = R™ when m = |. Suppose m > 1. We consider 
the projection 

mk’ — kK" 
given by dropping the last entry: a(a,...,am) = (a@i,..-,Gm-1). Its kernel is 
{(0,...,0, @m)} = R. Let W C R™ be a submodule, and let p: W——>R”™' be the 
restriction of 7 to W. The image ¢(W) is finitely generated, by induction. Also, 


ker g = (W 1 ker 7) is a submodule of ker 7 ~ R, so it ts finitely generated too. 
By Lemma (5.14), W is finitely generated, as required. o 


This proposition completes the proof of Theorem (4.11). 

Since principal ideal domains are noetherian, submodules of finitely generated 
modules over these rings are finitely generated. But in fact, most of the rings which 
we have been studying are noetherian. This follows from another of Hilbert’s fa- 
mous theorems: 


(5.18) Theorem. Hilbert Basis Theorem: Vf a ring R 1s noetherian, then so is the 
polynomial ring R[x]. 


The Hilbert Basis Theorem shows by induction that the polynomial ring R[x1,..., Xn] 
in several variables over a noetherian ring R is noetherian, hence that the rings 
Z[x1,..-, Xn] and F[x,,...,xn] (F a field) are noetherian. Also, quotients of noethe- 
rian rings are noetherian: 


(5.19) Proposition. Let R be a noetherian ring, and let J be an ideal of R. The 
quotient ring R = R/J is noetherian. 

Proof. Let J be an ideal of R, and let J = 2~'(J) be the corresponding ideal of 
R, where 77: R———R is the canonical map. Then J is finitely generated, say by 
(a1,...,@m). It follows that the finite set (@,...,@m) generates J(5.14). 5 
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Combining this proposition with the Hilbert Basis Theorem gives the tollow- 
ing result: 


(5.20) Corollary. Any ring which is a quotient of a polynomial ring over the in- 
tegers or over a field is noetherian. o 


Proof of the Hilbert Basis Theorem. Assume that R is noetherian, and let / be 
an ideal of the polynomial ring R[x]. We must show that a finite set of polynomials 
suffices to generate this ideal. 

Let’s warm up by reviewing the case that R is a field. In that case, we may 
choose a nonzero polynomial f € / of lowest degree, say 


(5.21) f(x) = Gax” + <oaeegix ain FO, 
and prove that it generates the ideal as follows: Let 
(5.22) 2(x) = bmx™ + > + bx + bo, bn FO, 


be a nonzero element of /. Then the degree m of g is at least n. We use induction on 
m. The polynomial 


(5.23) g(x) = (Dm/@n)x™ "f (x) = gilx) 
is an element of / of degree < m. By induction, g, is divisible by f; hence g is divis- 
ible by f. 


Formula (5.23) is the first step in the division with remainder of g by f. The 
method does not extend directly to arbitrary rings, because division with remainder 
requires that the leading coefficient of f be a unit. More precisely, in order to form 
the expression (5.23) we need to know that a, divides bm in the ring R, and there is 
no reason for this to be true. We will need more generators. 

Let us denote by A the set of leading coefficients of all the polynomials in /, 
together with the zero element of R. 


(5.24) Lemma. The set A of leading coefficients of the polynomials in an ideal of 
R[x], together with 0, forms an ideal of R. 


Proof. If a@ = an is the leading coefficient of f, then ra is the leading 
coefficient of rf, unless by chance ra = 0. In both cases, ra € A. Next, leta = ap 
be the leading coefficient of f, and let B = bm be the leading coefficient of g, where, 
say, m = n. Then a is also the leading coefficient of x” "f. Hence the coefficient of 
x™ in the polynomial h = x”™""f + g is a + B. This is the leading coefficient of h 
unless it is zero, and in either case, a + B E A.o 


We return to the proof of the Hilbert Basis Theorem. According to the lemma, 
the set A is an ideal of the noetherian ring R, so there exists a finite set of genera- 
tors, say (a,...,@%), for this ideal. We choose for each i, | < i < k, a polynomial 
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fi © I with leading coefficient a, and we multiply these polynomials by powers of x 
as necessary, so that their degrees become equal to some common integer n. 

The set of polynomials (f,..... fx) obtained in this way will allow us to adapt 
the induction step (5.23), but it will probably not generate /. We have little chance of 
finding a polynomial of degree <n in the ideal (f,,..., f<). So we must add some ele- 
ments of low degree to get generators for our ideal. The following lemma is easy, 
and we omit its proof: 


(5.25) Lemma. Let P, denote the set of polynomials in R[x] which have de- 
gree <n, together with zero, and let $, = 1  P,. Then S, is an R-submodule of 
the R-module P,,. 


The R-module P,, is generated by the monomials 1, x,...,x”~', so it is finitely gener- 
ated. Since R is noetherian, we may use Lemma (5.25) and Proposition (5.17) to 
conclude that there is a finite set (h,,..., 4s) of elements which generates S, as an R- 
module. We claim that the combined set (fi,..., fx; /u,..., Ms) generates /. 

Denote by J the ideal generated by this set. By construction, J C J. We need 
to prove the opposite inclusion, and we use induction on the degree of an element 
g € 1. We denote this degree by m. If m <n, then g € S,, and therefore g is a 
linear combination of (;,..., 4s), with coefficients in R. So g € J in that case. As- 
sume that m = n, and let the leading coefficient of g be b = bm. Then b is in the 
ideal A of leading coefficients, so it is a linear combination of the generators of that 
ideal, say b = ria, + +++ + rgax. Remembering that a; is the leading coefficient of 
fi, we see that the polynomial 


p= x" nf) 


has the same leading coefficient and the same degree as g, and it is in J. So 
21 = g — p has degree less than m. By induction, g, © J, and hence g € J.o 


6. THE STRUCTURE THEOREM FOR ABELIAN GROUPS 


The Structure Theorem for abelian groups asserts that a finitely generated abelian 
group V is a direct sum of cyclic groups. The work of the proof has already been 
done. We know that there exists a diagonal presentation matrix for V, and what re- 
mains for us to do is to interpret the meaning of this diagonal matrix for the group. 

We first need to extend the concept of direct sum from vector spaces to arbi- 
trary modules. The definition is the same. Let W,,..., We be submodules of a module 
V. Their sum is the submodule which they generate. It consists of all sums 


(6.1) Witte +W={fo EV | Do = wite-+we, with w © Wi}. 


The verification that this is a submodule is routine, and it is the same as for sums of 
subspaces of a vector space. We say that V is the direct sum of the submodules W,; if 
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(i) they generate: V = W, + ++: + Wy; 
(ii) they are independent: If w, + +++ + we = 0, with w; © Wi, then w; = 0 for 
each i. 


Thus V is the direct sum of the submodules W;, if every element v € V can be writ- 
ten uniquely in the form v = w, + ++» + we, with wi © W;. As with vector spaces, 
two submodules W,, W2 are independent if and only if W; M W2 = 0 [see Chapter 3 
(6.5)]. 


The symbol © is used to denote direct sums as before. So the notation 
(6.3) V=WiO:- OM, 


means that V is the direct sum of the submodules W;. 


(6.4) Theorem. Structure Theorem for abelian groups: Let V be a finitely gener- 
ated abelian group. Then V is a direct sum of finite cyclic subgroups Ca,,..., Ca, and 
a free abelian group L: 


V =Ca, 0° OC, OL, 
where the order d; of Ca, is greater than 1, and d;|d{ds.... 


We will use additive notation for the law of composition in the cyclic group here. So 
Cn is generated by one element v, with one relation nv = 0. Thus C, is isomorphic 
to Z/(n). The isomorphism Z/(n)——— C,, sends the residue of an integer r to rv. 


Proof of the theorem. We choose a presentation matrix A for V, determined by 
a set of generators and a complete set of relations. We can do this because V is 
finitely generated and because Z is a noetherian ring (see Section 5). By Proposition 
(5.12), the matrix A may be replaced by QAP ~', where Q and P are invertible. There- 
fore we may assume that A is diagonal, that the diagonal entries are nonzero, and 
that each diagonal entry divides the next. Moreover, we can drop any column of ze- 
ros, and any row and column in which the diagonal entry is 1 (5.12). So we may 
assume that the diagonal entries d; are not 0 or 1. The matrix A will then have the 
shape 


7 d 
dy 
(6.5) 
; a 


0 


It will therefore be an m X k matrix, where k = m. The meaning of this in terms of 
generators and relations for our module is that V is generated by m elements 
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(6.6) d, (oh) = 0, dav» = 0,..., deve = 
forms a complete set of relations among these generators. 

For j = |,..., k, let us denote by C, the cyclic subgroup generated by t;. Let L 
be the subgroup generated by the remaining generators ty+1,...,tm. Since the 


columns of (6.5) are a complete set of relations, there is no relation involving these 
last m — & generators. Therefore L is a free abelian group of rank m — k. We now 
verify that V = C,®---®C,.@L and that C; is a cyclic group of order d,. First, 
since V is generated by the v; and since each of the v; is included in one of the sum- 
mands, it is clear that V is the sum of these subgroups. Next, suppose that we have a 
relation, say 


ate-+u+w=O0, 


where 2; © Cj and w € L. Since C; is the cyclic group generated by uj, we can 
write zj=rjv; for some integer r;. Similarly, we may write 
W = k+1Uk+1 + -** + rmUm for some integers r;. Then the relation has the form 


ib) see Sie — 0. 


Since the columns of (6.5) form a complete set of relations, the vector (r1,..., rm)' is 
a linear combination of these columns. So r; = 0 if j > k, which implies that 
w = 0. In addition, r; must be divisible by d; if j =k, say rj = djs;. Then 
zj = sjdjvj = O. Thus the relation was trivial, and this shows that the subgroups are 
independent. It also shows that the order of the cyclic group C; is dj. So we have 
V = Ca, O--- OCa, OL, as required. o 


A finite abelian group is finitely generated, so as stated above the Structure 
Theorem decomposes a finite abelian group into a direct sum of finite cyclic groups, 
in which the order of each summand divides the next. The free abelian summand ts 
zero in this case. It is sometimes convenient to decompose the cyclic groups further, 
into cyclic groups of prime power order. This decomposition is based on Proposition 
(8.4) of Chapter 2, which we restate here: 


(6.7) Letr,s be relatively prime integers. The cyclic group Cmn of order rs is the 
direct sum of cyclic subgroups of orders r and s. 

Combining this lemma with the Structure Theorem yields the following: 
(6.8) Corollary. Structure Theorem, alternate form: Every finitely generated abe- 


lian group is a direct sum of cyclic groups of prime power orders and of a free 
abelian group. o 


It is natural to ask whether the orders of the cyclic subgroups which decompose 
a given finite abelian group are uniquely determined by the group. If the order of V 
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is a product of distinct primes, there is no problem. For example, if the order is 30, 
then V must be isomorphic to C,®C;@Cs. But can the same group be both 
C,BC,PC, and Cs@C,? It is not difficult to show that this is impossible by count- 
ing elements of orders | or 2. The group Cs®C, contains four such elements, while 
C,®C2@C, contains cight. This counting method will always work. 


(6.9) Theorem. Uniqueness for the Structure Theorem: 


(a) Suppose that a finite abelian group V is a direct sum of cyclic groups 
Ca, B+: BCa, where d,|d2|.... The integers d; are determined by the group 
V. 

(b) The same is true if the decomposition is into prime power orders, that is, if 
each d; is the power of a prime. 


We leave the proof as an exercise. o 


The counting of elements is simplified notationally by representing a direct 
sum as a product. Let R be a ring. The direct product of R-modules W,,..., Wx is the 
product set W, X ++: X W; of k-tuples: 


(6.10) W, X +++ & We = {(wi,..., we) | wi E Wi}. 
It is made into a module by vector addition and scalar multiplication: 
(W1,..., We) +(W1',..., We’) = (witwri’,..., Wetwy'), (Wi... We) = (rwi,..., FW). 


Verification of the axioms for a module is routine. 
Direct products and direct sums are isomorphic, as the following proposition 
shows: 


(6.11) Proposition. Let W,,..., Wi, be submodules of an R-module V. 


(a) The map oa: W, X°--- X W.—> V defined by 
o (W1,..., We) = Wi + ove + wy 


is a homomorphism of R-modules, and its image is the sum W, + --- + Wg. 


(b) The homomorphism ga is an isomorphism if and only if V is the direct sum of 
the submodules W;. 


We have seen similar arguments several times before, so we omit the proof. Note 
that the second part of the proposition is analogous to the statement that the map 
(2.5) R‘—— V defined by a set (v1,... vx) is bijective if and only if this set is a 
basis. o 


Since a cyclic group Ca of order d is isomorphic to the standard cyclic group 
Z/(d), we can use Proposition (6.11) to restate the Structure Theorem as follows: 
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(6.12) Theorem. Product version of the Structure Theorem: Every finitely gener- 
ated abelian group V is isomorphic to a direct product of cyclic groups 


Z/(d\) X ++» X Z/ (dy) X Z’, 


where d;,r are integers. There is a decomposition in which each d; divides the next 
and one in which each d; is a prime power. o 


This classification of abelian groups carries over to Euclidean domains without 
essential change. Since a Euclidean domain R is noetherian, any finitely generated 
R-module V has a presentation matrix (5.6), and by the diagonalization theorem 
(4.6) there is a presentation matrix A which is diagonal. 

To carry along the analogy with abelian groups, we define a cyclic R-module V 
to be one which is generated by a single element v. This is equivalent with saying 
that V is isomorphic to a quotient module R//, where / is the ideal of R elements a 
such that av = 0. Namely, the map gy: R—— V sending r ~~~rv is a surjective 
homomorphism of modules because tv generates V, and the kernel of gy, the module 
of relations, is a submodule of R, an ideal / (1.3). So V is isomorphic to R// by the 
First Isomorphism Theorern. Conversely, if R//——> V is an isomorphism, the im- 
age of | will generate V. If R is a Euclidean domain, then the ideal / will be princi- 
pal. so V will be isomorphic to R/(a) for some a € R. In this case the module of 
relations will also be generated by a single element. 

Proceeding as in the case of abelian groups, one proves the following theorem: 


(6.13) Theorem. Structure Theorem for modules over Euclidean domains: 


(a) Let V be a finitely generated module over a Euclidean domain R. Then V is a 
direct sum of cyclic modules C; and a free module L. Equivalently, there is an 
isomorphism 

ep: V—> R/(di) X +++ X R/(di) XR" 
of V with a direct product of cyclic modules R/(d;) and a free module R’, 
where r is nonnegative, the elements d),... dx are not units and not zero, and d; 
divides dj, for each i = 1,...,k — 1. 

(b) The same assertion as (a), except that the condition that d; divides dj+, is re- 
placed by this: Each d; is a power of a prime element of R. Thus V is isomor- 
phic to a product of the form 


R/(pi°!) X «++ X R/(pn") X R’, 


with repetitions of primes allowed. o 


For example, consider the F[1t]-module V presented by the matrix A of Exam- 
ple (4.7). According to (5.12), it is also presented by the diagonal m::trix 


ei! 
i -| ae 
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and we can drop the first row and column from this matrix (5.12). So V is presented 
by the 1 x 1 matrix [g], where g(t) = (¢ — 1)?(t — 2). This means that V is a cyclic 
module, isomorphic to F[t]/(g). Since g has two relatively prime factors, V can be 
further decomposed. It is isomorphic to the direct product of two cyclic modules 


(6.14) V = Ftl/(g) ~ [FEV — IPT xX FEe/@ — 2)). 0 


With slightly more work, Theorem (6.13) can be extended to modules over 
any principal ideal domain. It is also true that the prime powers occurring in (b) are 
unique up to unit factors. A substitute for the counting argument which proves Theo- 
rem (6.9) must be found to prove this fact. We will not carry out the proof. 


7, APPLICATION TO LINEAR OPERATORS 


In this section we apply the theory developed in the last section in a novel way to 
linear operators on vector spaces over a field. This application provides a good ex- 
ample of the way “proof analysis” can lead to new results in mathematics. The 
method developed first for abelian groups is extended formally to modules over Eu- 
clidean domains. Then it is applied to a concrete new situation in which the ring is a 
polynomial ring. This was not the historical development. The theories for abelian 
groups and for linear operators were developed independently and were tied together 
Jater. But it is striking that the two cases, abelian groups and linear operators, can be 
formally analogous and yet end up looking so different when the same theory is ap- 
plied to them. 

The key observation which allows us to proceed is that if we are given a linear 
operator 


Gillean EVV 


on a vector space over a field F, then we can use this operator to make V into a mod- 
ule over the polynomial ring F[z]. To do so, we have to define multiplication of a 
vector v by a polynomial f(t) = ant” + +++ + ait + ao. We set 


(7.2) f(t}o = anT"(v) + Gn-1T" '(v) + «++ + aT (v) + aon. 


The right side can be written as [f(7)](v), where f(T) denotes the linear operator 
Ont” + an-\T"! 4+ +++ + aT + aol obtained by substituting 7 for t. The brackets 
have been added only for clarity. With this notation, we obtain the formulas 


(7.3) to=T(v) and f(t)o = [f(1)(v). 


The fact that rule (7.2) makes V into an F [1 |-module is easy to verify. The formulas 
(7.3) may appear tautological. They raise the question of why we need a new symbol 
¢. But remember that f(t) is a formal polynomial, while f(T) denotes a certain linear 
operator. 

Conversely, let V be an F{t]-module. Then scalar multiplication of elements of 
V hy a polynomial f(t) is defined. In particular, we are given a rule for multiplying 
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by the constant polynomials, the elements of F. If we keep the rule for multiplying 
by constants but forget for the moment about multiplication by nonconstant polyno- 
mials, then the axioms (1.1) show that V becomes a vector space over F. Next, we 
can multiply elements of V by the polynomial r. Let us denote the operation of mul- 
tiplication by ¢ on V as T. Thus T is the map 


(7.4) T: V—>V, _ defined by T(v) = tv. 


This map is a linear operator on V, when it is considered as a vector space over F. 
For t(v + v') = tv + tv’ by the distributive law (1.1), and hence T(v + v’) = 
T(t) + T(v’). And if c © F, then tcv = ctv by the associative law (1.1) and the 
commutative law in F [1]: hence T(cv) = cT (v). Sb an F[t]-module V provides us 
with a linear operator on a vector space. 

The operations we have described, going from linear operators to modules and 
back, are inverses of each other: 


5) Linear operator on an F-vector space and F(t\-module 
are equivalent concepts. 


We will want to apply this observation to finite-dimensional vector spaces, but 
let us note in passing the linear operator which corresponds to the free F[t]-module 
F[t] of rank 1. We know that F[t]| is infinite-dimensional when it is considered as a 
vector space over F. The monomials (1, ¢, t’,...) form a basis, and we can use this 
basis to identify F[t] with the space Z of infinite F-vectors, as in Chapter 10 (2.8). 


Z = {(ao, a1, 2,...)|a; € F and only finitely many a; are nonzero}. 
Multiplication by t on F[t] corresponds to the shift operator T: 
(do, a RCl 2). ...) ww (0, do, a1, a2, ae) 


Thus, up to isomorphism, the free F[t]-module of rank | corresponds to the shift 
operator on the space Z. 

We now begin our application to linear operators. Given a linear operator T on 
a vector space V over F, we may also view V as an F{t|-module. Let us suppose that 
V is finite-dimensional as a vector space, say of dimension n. Then it is certainly 
finitely generated as a module, and hence it has a presentation matrix. There is some 
danger of confusion here because there are two matrices around: the presentation 
matrix for the module V, and the matrix of the linear operator T. The presentation 
matrix is an r X s matrix with polynomial entries, where r is the number of chosen 
generators for the module and s is the number of relations. On the other hand, the 
matrix of the linear operator is an n X n matrix whose entries are scalars, where n is 
the dimension of V as a vector space. Both matrices contain the information needed 
to describe the module and the linear operator. 

Regarding V as an F[t]-module, we can apply Theorem (6.13) to conclude that 
V is a direct sum of cyclic submodules, say 


Y= W,@-- OW, 
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where W; is isomorphic to F [t]/(p;‘), pt) being an irreducible polynomial in F [¢]. 
There is no free summand, because we are assuming that V is finite-dimensional. 

We have two tasks: to interpret the meaning of the direct sum decomposition 
for the linear operator T, and to describe the linear operator when the module is 
cyclic. It will not be surprising that the direct sum decomposition gives us a block 
decomposition of the matrix of T, when a suitable basis is chosen. The reason is that 
each of the subspaces W; is 7-invariant, because W; is an F[t]-submodule. Multipli- 
cation by ¢ carries W; to itself, and t operates on V as the linear operator 7. We 
choose bases B; for the subspaces W;. Then the matrix of 7 with respect to the basis 
B = (B,,..., Bx) has the desired block form [Chapter 4 (3.8)]. 

Next, let W be a cyclic F[{t]-module. Then W is generated as a module by a 
single element w; in other words, every elenfent of W can be written in the form 


g(t)w = b-t'w + «++ + bitw + bow, 


where g(t) = b-t’ + --- + bit + bo © F[t]. This implies that the elements 
w,tw,t’w,... span W as a vector space. In terms of the linear operator, W is 
spanned by the vectors w,7(w),T7(w),... . 

Various relations between properties of an F[t]-module and the corresponding 
linear operator are summed up in the table below. 


(7.6) Dictionary. 


multiplication by ¢ operation of T 

free module of rank 1 shift operator 

cyclic module generated by » _ vector space spanned by v,7 (v),T7(wv),... 
submodule T-invariant subspace ; 

direct sum of submodules direct sum of 7-invariant subspaces 
F[t]-module Linear operator T 


Let us now compute the matrix of a linear operator T on a vector space which 
corresponds to a cyclic F[t]-module. Since every ideal of F[t] is principal, such a 
module will be isomorphic to a module of the form 


(7.7) W = Fit/(f), 


where f = 1" + ay.it"” | + +++ + at + ao is a polynomial in F[r]. Let us use the 
symbol wo to denote the residue of | in W. This is our chosen generator for the mod- 
ule. Then the relation fwo = 0 holds, and f generates the module of relations. 

The elements wo, two,...,¢" 'wo form a basis for F[t]/(f) [see Chapter 10 
(5.7)]. Let us denote this basis by w; = t'wo. Then 


US UM MY) = Wgecey = Epa = Whaeatie 


-and also fwo = 0. This last relation can be rewritten using the others in order to de- 
termine the action of t on Wn-1: 


(t” + an-it™™ | + +++ + ait + ao)Wo = tWa-1 + Gn-1Wa-1 + °** + aiwi + dow = O. 
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Since T acts as multiplication by t, we have 
T (wo) = wi, T(w) = wo,..., T (Wn-2) = Wn-1, 
and 
Ge) = cay = a, — sawn: 


This determines the matrix of T. It has the form illustrated below for various values 
of n: 


) —-Ao 
10 ~a) 
0 ay [° 9 7 10. 
(7.8) [-a}, | = 1 0 -a,|...., :; 
0 Pa 0 
(ear 


(7.9) Theorem. Let T be a linear operator on a finite-dimensional vector space V 
over a field F. There is a basis for V with respect to which the matrix of T is made up 
of blocks of the type (7.8). o 


Such a form for the matrix of a linear operator is called a rational canonical 
form. It isn’t particularly nice, but it is the best form available for an arbitrary field. 

For example, the module (6.14) is a direct sum of two modules. Its rational 
canonical form is 


-1 
ieee 


(7.10) new) 


We now consider more carefully the case that F is the field of complex num- 
bers. Every irreducible polynomial in C[t] is linear, p(t) = t — a, so according to 
Theorem (6.12), every finite-dimensional C[z]-module is a direct sum of submod- 
ules isomorphic to ones of the form 
(7.11) W = C[t]/(t — a)". 

We let wo denote the residue of 1 in W as before, but we make a different choice of 
basis for W this time, setting w; = (t-a@)'wo. Then 


(t-a)Wo = wi, (t—@)wi = Wo,..., (t-@)Wa-2 = Wa-1, and (t—a)wr-; = 0. 
We replace t by T and solve, obtaining 

Tw; = Wi+1 + awi, 
for i = 0,...,n — 2, and 


TWr-) = AWn-}. 
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The matrix of T has the form 


0 a 0 O 
(7.12) [a], \< i {a Oe 
la 

OF Ile 


la 


These matrices are called Jordan blocks. Thus we obtain the following theorem: 


(7.13) Theorem. Let T: V—— V be a linear operator on a finite-dimensiqnal 
complex vector space. There is a basis of V such that the matrix of T with respect to 
this basis is made up of Jordan blocks. o 


Such a inatrix is said to be in Jordan form, or to be a Jordan matrix. Note that it is 
lower triangular, so the diagonal entries are its eigenvalues. Jordan form is much 
nicer than rational canonical] form. 

It is not hard to show that every Jordan block has a unique eigenvector. 

Given any square complex matrix A, the theorem asserts that PAP™' is in Jor- 
dan form for some invertible matrix P. We often refer to PAP ' as “the Jordan form 
for A.” It is unique up to permutation of the blocks, because the terms in the direct 
sum decomposition are unique, though we have not proved this. 

The Jordan form of the module (6.14) is made up of two Jordan blocks: 


] 


1.3 
(7.14) 5 | 
One important application of Jordan form is to the explicit solution of systems 
of a first-order linear differential equation 
dx 


7.15 & sox 
( ) - AX 


As we saw in Chapter 4 (7.11), the problem of solving this equation reduces easily 
; — ~ _ 
to solving the equation 4 where A = PAP™' is any similar matrix. So pro- 


vided that we can determine the Jordan form A of the given matrix A, it is enough to 
solve the resulting system. This in turn reduces to the case of a single Jordan block. 
One example of a 2 x 2 Jordan block was computed in Chapter 4 (8.18). 

The solutions for an arbitrary k X k Jordan block A can be determined by com- 
puting the matrix exponential. We denote by w the k X k matrix obtained by substi- 
tuting a = 0 into (7.12). Then N*« = 0. Hence 


ei 1 apg! + oN et 


This is a lower triangular matrix which is constant on diagonal bands and whose 
entries on the ith diagonal band below the diagonal are t'/i!. Since NM and al 
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commute, 
eA = eet = ety + Nt/T + oe + NEP — 1))). 


Thus if A is the matrix 


3 
A=]1 3 ‘ 
1 3 
then 
e* 1 et! 
ee“ = et t 1 = tex et 
e* $f? 1 1 st? e* te! e 


Theorem (8.14) of Chapter 4 tells us that the columns of this matrix form a basis for 
the space of solutions of the differential equation (7.15). 

Computing the Jordan form of a given matrix requires finding the roots of its 
characteristic polynomial p(t). If the roots a;,...,@, are distinct, the Jordan form is 
diagonal: 


a 
“a 


Suppose that the root a, = a@ is an r-fold root of p(t). Then there are various possi- 
bilities for the part of the Jordan matrix with diagonal entries a. Here are the possi- 
bilities for small r: 


vos aft oJ 


r= 
a . a a 
1 a Ae a ig 0 PF ; 
n= 3: l a a al 
a a a a 
l a l a loa 1 a 
r= 4: | aes 5 l a ’ a > 
l a a l a a 
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They can be distinguished by computing eigenvectors of certain operators related to 
T. The space of solutions to the system of equations 


(A — al)x = 0 


is the space of eigenvectors of A with eigenvalue a. One can solve this system ex- 
plicitly, given A and a. If r = 4, the dimensions of the solution space in the five 
cases shown above are 1, 2, 2,3, 4 respectively, because one eigenvector is associ- 
ated to each block. So this dimension distinguishes all cases except the second and 
third. These remaining two cases can be distinguished by the matrix (A — a/)’. It is 
zero in case three and not zero in case two. 

It can be shown that the dimensions of the null spaces of the operators 
(A — al)”, v = 1,2,...,r/2, distinguish the Jordan forms in all cases. 


8. FREE MODULES OVER POLYNOMIAL RINGS 


The structures of modules over a ring become increasingly complicated with increas- 
ing complication of the ring. It is even difficult to determine whether or not an ex- 
plicitly presented module is free. In this section we describe, without proof, a theo- 
rem which characterizes free modules over polynomial rings. This theorem was 
proved by Quillen and Suslin in 1976. 

Let R = C[x,...,x«] be the polynomial! ring in k variables, and let V be a 
finitely generated R-module. We choose a presentation matrix A for the module. The 
entries of A will be polynomials aj(x), and if A is an m X n matrix, then V is isomor- 
phic to the cokernel R”/AR” of multiplication by A on R-vectors. We can evaluate 
the matrix entries aj(x) at any point p = (p,,... px) of C*, obtaining a complex ma- 
trix A(p) whose i, j-entry is aj(p). 


(8.1) Theorem. Let V be a finitely generated module over the polynomial ring 
Clx,,...,xx], and let A. be an m X n presentation matrix for V. Denote by A(p) the 
evaluation of A at a point p € C*. Then V is a free module of rank r if and only if 
A(p) has rank m — r for every point p. o 


The proof of this theorem requires background which we don’t have. How- 
ever, we can easily see how to use it to determine whether or not a given module is 
free. For example, consider the polynomial ring in two variables: R = C[x, y]. Let 
V be the module presented by the 4 X 2 matrix 


1 x 

y x3 
(8.2) A=|x_ y 

x? y- 


So V has four generators and two relations. Let p be a point (a,b) € C?. The two 
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columns of the matrix Ap are 
oy = (1, b,a,a’)', v2 = (a,at+3, b, by. 


It is not hard to show that these two vectors are linearly independent for every 
choice of a, b, from which it follows that the rank of A(p) is 2 for every point (a, b). 
For suppose that the vectors are dependent: rv» = cv), or vice versa. Then the first 
coordinates show that v2 = av, hence 


(8.3) a+3 = ab, b=a’, b? =a’. 


These equations have no common solutions. By Theorem (8.1), V is a free module 
of rank 2. 

We can get an intuitive understanding for this theorem by considering the vec- 
tor space V, = C”/A(p) C” which is presented by the complex matrix A(p). It is nat- 
ural to think of this vector space as a kind of “evaluation of the module V at the 
point p,” and it can be shown that V, is essentially independent of the choice of the 
presentation matrix. Therefore we can use the module V to associate a vector space 
Vp to every point p € C*. If we imagine moving the point p about, then the vector 
space V, will vary in a continuous way, providing that its dimension does not jump 
around. This is because the matrix A(p) presenting V, depends continuously on p. 
Families of vector spaces of constant dimension, parametrized by a topological 
space, are called vector bundles. The module is free if and only if the family of vec- 
tor spaces V, forms a vector bundle. 


“Par une déformation coutumiere aux mathématiciens, 
je me’en tenais au point de vue trop restreint. 


Jean-Louis Verdier 


EXERCISES 
1. The Definition of a Module 


1. Let R be a ring, considered as an R-module. Determine all module homomorphisms 
Cake 

2. Let W be a submodule of an R-module V. Prove that the additive inverse of an element of 
W is in W. 

3. Let g: V——> W be a homomorphism of modules over a ring R, and let V’, W’ be sub- 
modules of V, W respectively. Prove that p(V’) is a submodule of W and that ¢ '(W’) is 
a submodule of V. 

4. (a) Let V be an abelian group. Prove thai :f V has a structure of Q-module with its given 

law of composition as addition, then this structure is uniquely determined. 

(b) Prove that no finite abelian group has a Q-module structure. 

5. Let R = Z[a], where a is an algebraic integer. Prove that for any integer m, R/mR is 
finite, and determine its order. 
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6. A module is called simple if it is not the zero module and if it has no proper submodule. 
(a) Prove that any simple module is isomorphic to R/M, where M is a inaximal ideal. 
(b) Prove Schur’s Lemma: Let yg: S——>S' be a homcmorphism ot simple modules. 

Then either ¢ is zero, or else it is an isomorphism. 
7. The annihilator of an R-module V is the set] = {r © R_ | rV = O}. 
(a) Prove that / is an ideal of R. 
(b) What is the annihilator of the Z-module Z/(2) x Z/(3) x Z/(4)? of the Z-module 
Z? 

8. Let R be a ring and V an R-module. Let # be the set of endomorphisms of V, meaning 
the set of homomorphisms from V to itself. Prove that E is a noncommutative ring, with 
composition of functions as multiplication and with addition defined by 
[p + W](m) = elm) + pln). 

9. Prove that the ring of endomorphisms of a simple module ts a field. 

10. Determine the ring of endomorphisms of the R-module (a) RK and (b) R/1, where / is an 
ideal. 
11. LetW CV C U be R-modules. 
(a) Describe natural homomorphisms which relate the three quotient modules U/W, 
U/V, and V/W. 
(b) Prove the Third Isomorphism Theorem: U/V ~ (U/W)/(V/W). 
12. Let V, W be submodules of a module U. 
(a) Prove that V M W and V + W are submodules. 
(b) Prove the Second Isomorphism Theorem: (V + W)/W is isomorphic to V/(V M W). 
13. Let V be an R-module, defined as in (1.1). If the ring R is not commutative, it is not a 
good idea to define vr = rv. Explain. 


2. Matrices, Free Modules, and Bases 


i. Let R = C{x, y], and let M be the ideal of R generated by the two elements (x, y). Prove 
or disprove: M is a free R-module. 

2. Let A be an n X n matrix with coefficients in a ring R, let py: R"-—-> R” be left multipli- 
cation by A, and let d = det A. Prove or disprove: The image of ¢ is equal to dR”. 

3. Let / be an ideal of a ring R. Prove or disprove: If R// is a free R-module, then / = 0. 

4. Let R be a ring, and let V be a free R-module of finite rank. Prove or disprove: 
(a) Every set of generators contains a basis. 
(b) Every linearly independent set can be extended to a basis. 

5. Let / be an ideal of a ring R. Prove that / is a free R-module if and only if it is a principal 
ideal, generated by an element @ which is not a zero divisor in R. 

6. Prove that a ring R such that every finitely generated R-module is free is either a field or 
the zero ring. 

7. Let A be the matrix of a homomorphism g: Z”——> Z” between free modules. 
(a) Prove that ¢ is injective if and only if the rank of A is n. 
(b) Prove that ¢ is surjective if and only if the greatest common divisor of the determi- 

nants of the m < m minors of A is 1. 

8. Reconcile the definition of free abelian group given in Section 2 with that given in Chap- 

ter 6, Section 8. 
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3. The Principle of Permanence of Identities 


i 


In each case, decide whether or not the principle of permanence of identities allows the 
result to be carried over from the complex numbers to an arbitrary commutative ring. 
(a) the associative law for matrix multiplication 

(b) Cayley-Hamilton Theorem 

(c) Cramer’s Rule 

(d) product rule, quotient rule, and chain rule for differentiation of polynomials 

(e) the fact that a polynomial of degree n has at most n roots 

(f) Taylor’s expansion of a polynomial 


. Does the principle of permanence of identities show that det AB = det A det B when the 


entries of the matrices are in a noncommutative ring R? 


. In some cases, it may be convenient to verify an identity only for the real numbers. Does 


this suffice? 


. Let R be a ring, and let A be a 3 X 3 R-matrix in SO;(R), that is, such that A‘A = / and 


det A = |. Does the principle of permanence of identities show that A has an eigenvector 
in R* with eigenvalue 1? 


4. Diagonalization of Integer Matrices 


1. 


Reduce each matrix below to diagonal form by integer row and column operations. 


3 chee 
(a) is | (b) lt é ;| @| 2-3 1 
4 6 —2 
(d) In the first case, let V = Z? and let L = AV. Draw the sublattice L, and find com- 
mensurable bases of V and L. 
Let A be a matrix whose entries are in the polynomial ring F(t], and let A’ be obtained 
from A by polynomial row and column operations. Relate det A to det A’. 
Determine integer matrices P™',Q which diagonalize the matrix A = i : Al 
Let di, d,,... be the integers referred to in Theorem (4.3). 
(a) Prove that d; is the greatest common divisor of the entries aj of A. 
(b) Prove that d,d> is the greatest common divisor of the determinants of the 2 X 2 
minors of A. 
(c) State and prove an extension of (a) and (b) to d; for arbitrary i. 


. Determine all integer solutions to the system of equations AX = 0, when 


he @ 
ee, a 


. Find a basis for the following submodules of Z’. 


(a) The module generated by (1,0, —1), (2, -3, 1), (0,3, 1), (3, 1,5). 
(b) The module of solutions of the system of equations x + 2y + 3z = 0, 
x + 4y + 9z=0. 


=a : 
. Prove that the two matrices ' i and i | generafe the group SL,(Z) of integer 


matrices with determinant 1. 
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8. Prove that the group SL,(Z) is generated by elementary integer matrices of the first type. 
9. Let a,B,y be complex numbers, and let A = {€a + mB + ny|€,m,n, € Z} be the 
subgroup of C* they generate. Under what conditions is A a lattice in C? 

10. Let @: Z*——~» Z* be a homomorphism given by multiplication by an integer matrix A. 
Show that the image of g is of finite index if and only if A is nonsingular and that if so, 
then the index is equal to | det A]. 

11. (a) Let A = (a;,...,an)' be an integer column vector. Use row reduction to prove that 
there is a matrix P € GL,(Z) such that PA =-(d, 0,...,0)', where d is the greatest 
common divisor of @1,..., @n. 

(b) Prove that if d = 1, then A is the first column of a matrix of M € SL,(Z). 


5. Generators and Relations for Modules 


1. In each case, identify the abelian group which has the given presentation matrix: 
1 0 


‘| S| 1 Zuesal, | 2 a4iiel 46 
cies OOF; | 0 lf, f : : 
1 al linea? pew (2 93 

4 0 0 
2. Find a ring R and an ideal / of R which is not finitely generated. 


. Prove that existence of factorization holds in a noetherian integral domain. 

4. Let V C C” be the locus of zeros of an infinite set of polynomials f,, ff, fs,.... Prove 
that there is a finite subset of these polynomials whose zeros define the same locus. 

5. Let S be a subset of C”. Prove that there is a finite set of polynomials (f,,..., f<) such 
that any polynomial! which vanishes identically on S is a linear combination of this set, 
with polynomial coefficients. 

6. Determine a presentation matrix for the ideal (2,1 + 8) of Z[6], where 6 = V-5S. 

*7. Let S be a subring of the ring R = C[t] which contains C and is not equal to C. Prove 
that R is a finitely generated S-module. 

8. Let A be the presentation matrix of a medule V with respect to a set of generators 

(0),..., 0m). Let (w,,...,,-) be another set of elements of V, and write the elements in 
terms of the generators, say w; = 2pivj, py € R. Let P = (pi). Prove that the block 


ed 


: A IP || 5 . : 
matrix sie is a presentation matrix for V with respect to the set of generators 


0 
(Wino pees) 
*9, With the notation of the previous problem, suppose that (w,,..., w,) is also a set of gen- 


erators of V and that B is a presentation matrix for V with respect to this set of generators. 
Say that v; = 2qyw; is an expression of the generators v; in terms of the wy. 
a wa 
' moe ee ; 

(a) Prove that the block matrix M@ = tenia 48) presents V with respect te the 
generators (v1,..., Um} Wis-.:, Wr). 

(b) Show that M can be reduced to A and to B by a sequence of operations of the form 
(Same), 

10. Using 9, show that any presentation matrix of a module can be transformed to any other 
by a sequence of operations (5.12) and their inverses. 
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Sal 


= 


The Structure Theorem for Abelian Groups 


Find a direct sum of cyclic groups which is isomorphic to the abelian group presented by 
2aetoee 

the matrix] 2 2 0}. 
a0 2 


. Write the group generated by x, v, with the relation 3x + 4y = 0 as a direct sum of 


cyclic groups. 


. Find an isomorphic direct product of cyclic groups, when V is the abelian group gener- 


ated by x, y,z, with the given relations. 

(a) 3x + 2y + 8z = 0, 2x + 4z=0 

(b) x + y = 0, 2x = 0, 4x + 2z = 0, 4x + 2y + 22 =0 
(Cede tov =Oex — y +382 =O 

(d) 2x —4y = 0, 2x +2y+2z=0 

OTe ct Sveti 2z =O, 3x SVP 3x dla th.2z.>.0 


. Determine the number.of isomorphism classes of abelian groups of order 400. 
. Classify finitely generated modules over each ring. 


(a) Z/(4) (b) Z/(6) (ce) Z/nZ. 


- Let R be a ring, and let V be an R-module, presented by a diagonal m X n matrix A: 


VosaR aR". Letelape:..., Um) be the corresponding generators of V, and let dj be the di- 
agonal entries of A. Prove that V is isomorphic to a direct product of the modules R/(d;). 


. Let V be the Z[i|-module generated by elements v;, v2 with relations (1 + iv, + 


(2 — iv, = 0, 3v, + Siv. = 0. Write this module as a direct sum of cyclic modules. 


- Let W,,..., We be submodules of an R-module V such that V = 2W;. Assume that 


W. 1 W2 = 0, (W; + Wr) MW; = 0,...,(Wi + We + +) + We-1) MN We = 0. Prove 

that V is the direct sum of the modules W,,..., Wx. 

Prove the following. 

(a) The number of elements of Z/(p*°) whose order divides p” is p” if v = e, and is p® 
if = e, 

(b) Let W,,..., Wx be finite abelian groups, and let uj denote the number of elements of 
W, whose order divides a given integer g. Then the number of elements of the 
product group V = W, X --- X Wy whose order divides q is 4) +++ uz. 

(c) With the above notation, assume that W; is a cyclic group of prime power order 
d, = p*i. Let rr; be the number of d; equal to a given prime p, let r, be the number of 
d; equal to p*, and so on. Then the number of elements of V whose order divides p” 
is pS», where 5, = ry) + ++ + re, So = 71 + 2rg + e+ + Arg, 83 = 7 + Ara + 
3r3 + +** 4 Sepand soon. 

(d) Theorem (6.9). 


7. Application to Linear Operators 


iF 


Let T be a linear operator whose matrix is iG i Is the corresponding C[r]-module 


cyclic? 


10. 


il 


12 


13. 


id 


15. 


16. 
vy. 


18. 
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1 0 
Determine the Jordan form of the matrix | 0 1 0 
01 1 
- 
114 . 
Prove that }-1 —1 -1 | is a nilpotent matrix, and find its Jordan form. 
> a 


Let V be a complex vector space of dimension 5, and let T be a linear operator on V 
which has characteristic polynomial (tf — a)°. Suppose that the rank of the operator 
T — al is 2. What are the possible Jordan forms for T? 


. Find all possible Jordan forms for a matrix whose characteristic polynomial is 


(rat Fe — S’. 
What is the Jordan form of a matrix whose characteristic polynomial is (« — 2)*(1 - 5) 
and such that the space of eigenvectors with eigenvalue 2 is one-dimensional, while the 
space of eigenvectors with eigenvalue 5 is two-dimensional? 
(a) Prove that a Jordan block has a one-dimensional space of eigenvectors. 
(b) Prove that, conversely, if the eigenvectors of a complex matrix A are multiples of a 

single vector, then the Jordan form for A consists of one block. 
Determine all invariant subspaces of a linear operator whose Jordan form consists of one 
block. 
In each case, solve the differential equation dx/dt = AX when A is the Jordan block 
given. 

1 


2 0 0 
(a) fe ,| (b) iF 4 (c) }1 


Solve the differential equation dx/dt = AX when A is (a) the matrix (6.14), (b) the ma- 
trix (6.10), (ce) the matrix of problem 2, (d) the matrix of problem 3. 

Prove or disprove: Two complex n X n matrices A,B are similar if and only if they have 
the same Jordan form. 

Show that every complex n X n matrix is similar to a matrix of the form D + N, where D 
is diagonal, N is nilpotent, and DN = ND. 

Let R = F[x] be the polynomial ring in one variable over a field F, and let V be the R- 
module generated by an element tv which satisfies the relation (x? + 3x + 2)v = 0. 
Choose a basis for V as F-vector space, and find the matrix of the operator multiplication 
by f with respect to this basis. 

Let V be an F [1 ]-module, and let B = (v;,..., Uz) be a basis for V, as F-vector space. Let 
B be the matrix of T with respect to this basis. Prove that A = ¢/ — B is a presentation 
matrix for the module. 

Let p(t) be a polynomial over a field F. Prove that there exists an nm X n matrix with en- 
tries in F whose characteristic polynomial is p(t). 

Prove or disprove: A complex matrix A such that A? = A is diagonalizable. 

Let A be a complex n X n matrix such that AX = / for some n. Prove that the Jordan form 
for A is diagonal. 

Prove the Cayley—Hamilton Theorem, that if p(t) is the characteristic polynomial of an 
n X n matrix A, then p(A) = 0. 
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i: 


nD. 


26. 


The mistimal polynomial m(t) of a linear operator T on a complex vector space V is the 

polynomial of lowest degree such that m(T) = 0. 

(a) Prove that the minimal polyromial divides the characteristic polynomial. 

(b) Prove that every root of the characteristic polynomial m(1) is also a root of the mini- 
mal polynomial p(t). 

(c) Prove that 7 is diagonalizable if and only if m(7) has no multiple root. 


- Find all possible Jordan forms for 8 X 8 matrices whose minimal pojynomial is 


ae 1) 
Prove or disprove: A complex matrix A is similar to its transpose. 


Classify linear operators on a finitely generated F{r|-module. dropping the assumption 
that the module is finite-dimensional as a vector space. 


. Prove that the ranks of (A - a@/)” distinguish all Jordan forms, and hence that the Jordan 


form depends only on the operator and not on the basis. 

Show that the following concepts are equivalent: 

(i) R-module, where R = Z[i); 

(11) abelian group V, together with a homomorphism ~: V—-—V_ such that 
y° op = —identity. 

Let F = f,. For which prime integers p does the additive group F' have a structure of 

Z[i]-module? How about F 2) 

Classify finitely generated modules over the ring Cle], where e€? = 0. 


8. Free Modules over Polynomial Rings 


in 


es 


Determine whether or not the modules over C[x, y] presented by the following matrices 
are free. 
PG Dood 
; x x ie y yt 
(a) aoe se (b) | x y Oy], y 
y 5 aes 


. Prove that the module presented by (8.2) is free by exhibiting a basis. 


Following the model of the polynomial ring in one variable, describe modules over the 

ring C[x, y] in terms of real vector spaces with additional structure. 

Let R be a ring and V an R-module. Let / be an ideal of R, and let /V be the set of finite 

sums &s5;v;, where s; € / and 0; € V. 

(a) Show how to make V//V into an R/I-module. : 

(b) Let A be a presentation matrix for V, and let A denote its residue in R/J. Prove that A 
is a presentation matrix for V/IV. 

(c). Show why the module V, defined in the text is essentially independent of the presen- 
tation matrix. 

Using exercise 9 of Section 5, prove the easy half of the theorem of Quillen and Suslin: 

If V is free, then the rank of A(p) is constant. 

Let R = Z[V-5], and let V be the module presented by the matrix A = CA 

(a) Prove that the residue of A has rank | for every prime ideal P of R. 

(b) Prove that V is not free. 
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Miscellaneous Problems 


1. 


— 


Se 


Let G be a lattice group, and let g be a rotation in G. Let g be the associated element of 

the point group G. Prove that there-is a basis for R?, not necessarily an orthonormal! ba- 

sis, such that the matrix of g with respect to this basis is in GL2(Z). 

(a) Let a be a complex number, and tet Z[a] be the subring of C generated by a. Prove 
that @ is an algebraic integer if and only if Z[a] is a finitely generated abelian group. 

(b) Prove that if a, B are algebraic integers, then the subring Z[a,B] of C which they 
generate is a finitely generated abelian group. 

(c) Prove that the algebraic integers form a subring of C. 

Pick’s Theorem: Let A be the plane region bounded by a polygon whose vertices are at 

integer lattice points. Let / be the set of lattice points in the interior of A and B the set 

of lattice points on the boundary of A. If p is a lattice point, let r(p) denote the fraction 

of 27 of the angle subtended by A at p. Sor(p) = Oif p € A, r(p) = 1 if p is an inte- 

rior point of A, r(p) = $ if p is on an edge, and so on. 


(a) Prove that the area of A is ae r(p). 


(b) Prove that the area is |/| + 3(|B| — 2) if A has a single connected boundary curve. 


. Prove that the integer orthogonal group O,(Z) is a finite group. 
io 


e 


Consider the space V = Ré of column vectors as an inner product space. with the ordi- 
nary dot product (v: w) = v'w. Let L be a lattice in V, and define L* = 
{w | (v - w) € Z forall v € L}. 

(a) Show that L* is a lattice. 

(b) Let B = (u;,..., 0%) be a lattice basis for L, and let P = [B]-! be the matrix relating 
this basis of V to the standard basis E. What is the matrix A of dot product with re- 
spect to the basis B? 

(c) Show that the columns of P form a lattice basis for L*. 

(d) Show that if A is an integer matrix, then L C L*, and [L* : L] = |det |. 

Let V be a real vector space having a countably infinite basis {c,, v2, vs,...}, and let E be 

the ring of linear operators on V. 

(a) Which infinite matrices represent linear operators on V? 

(b) Describe how to compute the matrix of the composition of two linear operators in 
terms of the matrix of each of them. 

(c) Consider the linear operators T, 7’ defined by the rules 


T (von) = Un, T (von-1) = 0, T '(von) = 0, T '(v2n—-1) = One 91,2, 3,.206 


Write down their matrices. 

(d) We can consider E' = E as a module over the ring E, with scalar multiplication on 
the left side of a vector. Show that {7,7 '} is a basis of E' as E-module. 

(e) Prove that the free E-modules E*, k = 1,2,3..., are all isomorphic. 


. Prove that the group Q*/Z* is not an infinite direct sum of cyclic groups. 
. Prove that the additive group Q* of rational numbers is not a direct sum of two proper 


subgroups. 


- Prove that the multiplicative group Q™ of rational numbers is isomorphic to the direct 


sum of a cyclic group of order 2 and a free abelian group with countably many genera- 
tors. 
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10. 


* ue 


i2. 


“13, 


*14. 


Prove that two diagonalizable matrices are simultaneously diagonalizable, that is, that 
there is an invertible matrix P such that PAP ' and PBP ' are both diagonal, if and only 
if AB = BA. 

Let A be a finite abelian group, and let g¢: A——> C™ be a homomorphism which is not 


the trivial homomorphism (y(x) = 1 for all x). Prove that a <, (a) = 0. 


Let A be an m X n matrix with coefficients in a ring R, and let g: R” —> R”™ be left mul- 
tiplication by A. Prove that the following are equivalent: 
(i) ¢ is surjective; 
(ii) the determinants of the m X m minors of A generate the unit ideal; 
(ili) A has a right inverse. a matrix B with coefficients in R such that AB = /. 


Let (v1,..., Um) be generators for an R-module V, and let J be an ideal of R. Define JV to 

be the set of all finite sums of products av, a € J, v € V. 

(a) Show that if JV = V, there is an n Xn matrix A with entries in J such that 
(U1,.-.,Um)(1 — A) = 0. 

(b) With the notation of (a), show that det(7 — A) = 1 + a, where a € J, and that 
det (7 — A) annihilates V. 

(c) An R-module V is called faithful if rv = 0 or r © R implies r = 0. Prove the 
Nakayama Lemma: Let V be a finitely generated, faithful R-module, and let J be an 
ideal of R. If JV = V, then/J = R. 

(d) Let V be a finitely generated R-module. Prove that if MV = V for all maximal ideals 
M, then V = O. 


We can use a pair x(r), y(t) of complex polynomials in t to define a complex path in C?, 

by sending tw» (x(t), y(r)). They also define a homomorphism g: C[x, y]—~ C[r], 

by f(x, vy) ~~ f(x(r), y(t)). This exercise analyzes the relationship between the path and 

the homomorphism. Let’s rule out the trivial case that x(t), y(t) are both constant. 

(a) Let S denote the image of ~. Prove that S is isomorphic to the quotient C[x, y]/(f), 
where f(x, y) is an irreducible polynomial. 

(b) Prove that ¢ is the root of a monic polynomial with coefficients in S. 

(c) Let V denote the variety of zeros of fin C*. Prove that for every point (x0, yo) € V, 
there is a fo © C such that (x0, yo) = (x (to), y(to)). 
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Fields 


Our difficulty is not in the proofs, but in learning what to prove. 


Emil Artin 


1. EXAMPLES OF FIELDS 
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Much of the theory of ficlds has to do with a pair F C K of fields, one contained in 
the other. In contrast with group theory, where subgroups play an important role, 
we usually consider K as an extension of F; that is, F is considered te be the basic 
field, and K is related to it. An extension field of F is a field which contains F as a 
subfield. 

Here are the three most important classes of fields. 


(1.1) Number fields. A number field K is a subfield of C. 


Any subfield of € contains |, and hence it contains the field @ of rational numbers. 
So a number field is an extension of Q. The number fields most commonly studied 
are algebraic number fields. all of whose elements are algebraic numbers (see Chap- 
ter 10, Section 1). We studied quadratic number fields in Chapier 11. 


(1.2) Finite fields. A field having finitely many elemenis is called a finite field. 


If K is a finite field, then the kernel of the unique homomorphism g: Z—— K is a 
prime ideal [Chapter 11 (7.15)], and since Z is infinite while K is finite, the kernel is 
not zero Therefore it is generated by a prime integer p. The imege of ¢ is isomor- 
phic to the quotient Z/(p) = Fp. So K contains a subfield isomorphic to the prime 
field f,, and therefore it can be viewed as an extension of this prime field. We will 
describe all finite fields in Seciion 6. 


Section 2 Algebraic and Transcendentia! Elements 493 


(1.3) Function fields. Certain extensions of the field F = C(x) of rational func- 
tions are called function fields. 


Function fields play an important role in the theory of analytic funtions and in alge- 
braic geometry. Since we haven't seen them before, we will describe them briefiy 
here. A function field can be detined by an irreducible polynomial in two variables, 
say f(x,v) © C[x,y]. The polynomial f(x,y) = y? — x‘ + x is a good example. 
Given such a polynomial f, we may study the equation 


(1.4) f(x. y) = 


analytically, using it to define y “implicitly” as a function y (x) of x as Me learn to do 
in calculus. In our example, the function defined in this way is y = Vx' — x. This 
function isn’t single valued; it is determined only up to sign, but that isn’t a serious 
difficulty. We won’t have an explicit expression for such a function in general, but 
by definition, it satisfies the equation (1.4), that is, 


(1.5) F(x, y(x)) = 0. 


On the other hand, the equation can also be studied algebraically. Let us inter- 
pret f(x, y) as a polynomial in 5 whose coefficients are polynomials in x. Let F de- 
note the field C(x) of rational functions in x. if fis not a polynomial in x alone, then 
since it is irreducible in CLx, y], it will be an irreducibie element of F[y] {Chapter 
11 (3.9)}. Therefore the ideal generated by f in F[y] is maximal {Chapter 11 (1.6)], 
and F[y]/(/) = K is an extension field of F. 

The analysis and the algebra are related, because both the implicitly defined 
function y (x) and the residue y of y in F[y]/(/) satisty the equation fix, y) = 0. In 
this way, the residue of y, and indeed all elements of K, can be interpreted as func- 
tions of the variable x. Because of this, such fields are called function fields. We will 
discuss function fields in Section 7. 


2. ALGEBRAIC AND TRANSCENDENTAL ELEMENTS 


Let K be an extension of a field F, and let a be an element of K. In analogy with the 
definition of algebraic numbers (Chapter 10, Section 1), @ is said to be algebraic 
over F if it is the root of some nonzero polynomial with coefficients in F. Since the 
coefficients are from a field, we may assume that the polynomial 1s monic, say 


(2.1) x7 + ap-yx” | ++ +axt+a, witha; E F. 


An element a is called transcendental over F ii it is not algebraic over F, that is, if 
it is not a root of any such polynomial. 

Note that the two properties, algebraic and transcendental, «lepend on the 
given field F. For example, the complex number 277i is algebraic over the field of 
real numbers but transcendental over the field of rational numbers. Also, every ele- 
ment a of a field K is algebraic over K, because it is the root of the polynomial 
x — a, which has coefficients in K. 
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The two possibilities for @ can. be described in terms of the substitution 
homomorphism 


(252) gy: F[x]—> K, which maps f(x) ~~~ f(a). 


The element a is transcendental over F if g is injective and algebraic over F' other- 
wise, that is, if the kernel of ¢ is not zero. 

Assume that @ is algebraic over F. Since F(x] is a principal ideal domain, 
ker @ is generated by assingle element f(x), the monic polynomial of lowest degree 
having a@ as a root. Since K is a field, we know that f(x) must be an irreducible poly- 
nomial [Chapter 11 (7.15)], and in fact it will be the only irreducible monic polyno- 
mial in the ideal. Every other element of the ideal is a multiple of f(x). We will call 
this polynomial f the irreducible polynomial for a over F. 

It is important to note that this irreducible polynomial f depends on F as well 
as on a, because irreducibility of a polynomial depends on the field. For example, 
let F = Qfi], and let a be the complex number Vi = Len Ona + i). The irreducible 
polynomial for @ over Q@ is x* + 1, but this polynomial factors in the field 
F: x4 + 1 = (x? + (x? — i). The irreducible polynomial for a over F is x? — i. 
When there are several fields around, we must be careful to make it clear to which 
field we refer. To say that a polynomial is irreducible is ambiguous. It is better to say 
that f is irreducible over F, or that it is an irreducible element of F [x]. 

The field extension of F which is generated by an element a € K will be de- 
noted by F(a): 


(2.3) F(a) is the smallest field containing F and a. 

More generally, if @,,...,@, are elements of an extension field K of F, then the 
notation F(a@,,...,Q@n) will stand for the smallest subfield K which contains these 
elements. 


As in Chapter 10, we denote the ring generated by a over F by F[a]. It con- 
sists of all elements of K which can be written as polynomials in @ with coefficients 
in F: 

(2.4) Qna" ++ +aata, aE F. 


The field F(a) is isomorphic to the field of fractions of F [a]. Its elements are ratios 
of elements of the form (2.4)-[see Chapter 10 (6.7)]. ; 


(2.5) Proposition. If a is transcendental over F, then the map F[x]——> F[a] 
is an isomorphism, and hence F(q@) is isomorphic to the field F(x) of rational 
functions. 5 


This simple fact has the consequence that the field extensions F(a) are isomor- 
phic for all transcendental elements a, because they are all isomorphic to the field of 
rational functions F(x). For instance, 7 and e are both transcendental over Q 
(though we have not proved that they are). Therefore Q(z) and Q(e) are isomorphic 
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fields, the isomorphism carrying 7 to c. This is rather surprising at first glance. The 
isomorphism is not continuous when the fields are regarded as subfields of the real 
numbers. 

The situation is quite different if a@ is algebraic: 


(2.6) Proposition. 


(a) Suppose that @ is algebraic over F, and let f(x) be its irreducible polynomial 
over F. The map F[x]/(f)—— F[a] is an isomorphism, and F[a] is a field. 
Thus F[a] = F(a). 

(b) More generally, let a;,..., an be algebraic elements of a field extension K of F. 
Then F[a,...,@n] = F(ai,...,@n). 


Proof. Let g be the map (2.2), with K = F(a). Since f(x) generates ker ¢, 
we know that F [x]/(f) is isomorphic to the image of g [Chapter 10 (3.1)], which is 
F [a]. Since f is irreducible, it generates a maximal ideal [Chapter 11 (1.6)]. This 
shows that F [a] is a field. Since F(a) is isomorphic to the fraction field of F{a], it 
is equal to F[a@]. We leave the proof of the second part as an exercise. 5 


(2.7) Proposition. Let a be an algebraic element over F, and let f(x) be its irre- 
ducible polynomial. Suppose f(x) has degree n. Then (1,a,...,a@”') is a basis for 
F [a] as a vector space over F. 


Proof. This proposition is a special case of (5.7) in Chapter 10. o 


It may not be easy to tell whether or not two algebraic elements a, 8 generate 
isomorphic fields, though we can use Proposition (2.7) to give a necessary condi- 
tion: Their irreducible polynomials over F must have the same degree, because this 
degree is the dimension of the field extension as an F-vector space. This is obviously 
not a sufficient condition. For example, all the imaginary quadratic fields studied in 
Chapter 11 are obtained by adjoining elements 6 whose irreducible polynomials 
x’? — d have degree 2, but they aren’t all isomorphic. On the other hand, if @ is a 
root of x? — x + 1, then B = a? is a root of x° — 2x? + x — 1. The two fields 
Q(a) and Q(B) are actually equal, though if we were presented only with the two 
polynomials, it might take us some time to notice how they are related. 

What we can describe easily are the circumstances under which there is an 


isomorphism 
(2.8) F(a)—> F(B) 


which fixes F and sends a@ to B. The following proposition is fundamental to our un- 
derstanding of field extensions: 


(2.9) Proposition. Let a © K and B € L be algebraic elements of two extension 
fields of F. There is an isomorphism of fields 


o: F(a)— F(B), 
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which is the identity on the subfield F and which sends a~~~ B if and only if the 
irreducible polynomials for a and B over F are equal. 


Proof. Assume that f(x) is the irreducible polynomial for @ and for B over F. 
We apply Proposition (2.6), obtaining two isomorphisms 


F[x/(f) 29 Fla] and F[x]/(f) > FBI. 


The composed map a0 = Wy! is the required isomorphism. Conversely, if there is 
an isomorphism o sending @ to B which is the identity on F, and if f(x) € F[x] isa 
polynomial such that f(a) = 0, then f() = 0 too [see Proposition (2.11)]. Hence 
the two elements have the same irreducible polynomial. o 


(2.10) Definition. Let K and K’' be two extensions of the same field F. An iso- 
morphism y: K——> K' which restricts to the identity on the subfield F is called an 
isomorphism of field extensions, or an F-isomorphism. Two extensions K, K' of a 
field F are said to be isomorphic field extensions if there exists an F-isomorphism 
Ca K . 


(2.11) Proposition. Let ¢: K—-— K’ be an isomorphism of field extensions of F, 
and let f(x) be a polynomial with coefficients in F. Let a be a root of fin K, and let 
a’ = g(a) be its image in K’. Then a’ is also a root of f. 


Proof. Say that f (x) = anx” + +++ + aix + ao. Then g(ai) = a; and g(a) = 
a’. Since g is a homomorphism, we can expand as follows: 


0 = —(0) = e(fla)) = G(ana” + + + aia + ao) 
= 9(an)p(a)” + -- + pla)g(a) + (ao) 
=a,a' + ++ + aa’ + a. 
This shows that a’ is a root of f. o 
For example, the polynomial x* — 2 is irreducible over Q. Let a denote the 


real cube root of 2, and let £ = e?”? be a complex cube root of 1. The three com- 
plex roots of x° ~— 2 are a, fa, and ¢’@. Therefore there is an isomorphism 


(2.12) Q(a)—> O(¢a) 


sending a to fa. In this case the elements of Q(q) are all real numbers, but Q(a) is 
not a subfield of R. To understand the isomorphism (2.12), we must stop viewing 
these fields as subfields of C and look only at their internal algebraic structure. 


3. THE DEGREE OF A FIELD EXTENSION 


An extension K of a field F can always be regarded as an F-vector space. Addition is 
the addition law in K, and scalar multiplication of an element a of K by an element 
c of F is defined to be the product ca formed by multiplying these two elements in 
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K. The dimension of K as an F-vector space is called the degree of the field exten- 
sion F C K. The degree is the simplest invariant of an extension, but though simple, 
it is important. It will be denoted by 


(3.1) [K : F] = dimension of K, as an F-vector space. 


For example, C has the R-basis (1, i), so[C : R] = 2. 

A field extension F C K is called a finite extension if its degree [K : F] is 
finite. Extensions of degree 2 are also called quadratic extensions, those of degree 3 
are called cubic extensions, and so on. The degree of an extension F C K is 1 if and 
only if F = K. 

The term degree comes from the case that K = F(a) is generated by one alge- 
braic element a. In that case, K has the basis (1,@,...,a@”'), where n is the degree 
of the irreducible polynomial for a over F [Proposition (2.7)]. Thus we find the first 
important property of the degree: 


(3.2) Proposition. If a is algebraic over F, then [F(a@) : F] is the degree of the ir- 
reducible polynomial for a@ over F. o 


This degree is also called the degree of a over F. Note that an element a has degree 
1 over F if and only if it is an element of F, and a has degree ~ if and only if it is 
transcendental over F. 

Extensions of degree 2 are easy to describe. 


(3.3) Proposition. Assume that the field F does not have characteristic 2, that is, 
that | + 1 # OinF. Then any extension F C K of degree 2 can be obtained by ad- 
joining a square root: K = F(&), where 5” = D is an element of F. Conversely, if 6 
is an element of an extension of F, and if 6? © F but 6 € F, then F(8) is a 
quadratic extension. 


Proof. We first show that every quadratic extension is obtained by adjoining a 
root of a quadratic polynomial f(x) © F [x]. To do this, we choose any element a of 
K which is not in F. Then (1,q@) is a linearly independent set over F. Since K has 
dimension 2 as a vector space over F, (1, a) is a basis for K over F, and K = F[a]. 
It follows that a? is a linear combination of (1,a@), say a? = —ba — c, with 
b,c € F. Then a is a root of f(x) = x? + bx +c. 

Since 2 # 0 in F, we can use the quadratic formula a = }(-b + Vb?—4c) to 
solve the equation x” + bx + c = 0. This is proved by direct calculation. There are 
two choices for the square root, one of which gives our chosen root a. Let 6 de- 
note that choice: 5 = Vb*—4c = 2a + b. Then 6 is in K, and it also generates K 
over F. Its square is the discriminant b? — 4c, which is in F. 

The last assertion of the proposition is clear. o 


The second important property of the degree is that it is multiplicative in 
towers of fields. 


(3.4) Theorem. Let F C K CL be fields. Then[L : F] = [L: K][K: F]. 


498 Fields Chapter 13 


Proof. Let B = (y1,...,Yn) be a basis for L as a K-vector space, and let 
C = (x,...,X%m) be a basis for K as an F-vector space. So [L : K] =n and 
[K : F] = m. We will show that the set of mn products P = (.... xiyj,...) 18 a basis 
of L as an F-vector space, and this will prove the proposition. The same reasoning 
will work if B or C is infinite. 

Let a be an element of L. Since B is a basis for L over K, we can write 
a = By, + -- + Bryn, with B; © K, in a unique way. Since C is a basis for K 
over F, each B; can be expressed uniquely, as Bj = ayjx) + ++ + GmjXm, with 
aj © F. Thus a = 3; jayxiy;. This shows that P spans L as an F-vector space. We 
know that 8; is uniquely determined by a, and since B is a basis for K over F, the 
elements aj; are uniquely determined by fj. So they are uniquely determined by aq. 
This shows that P is linearly independent, and hence that it is a basis for L over F. 5 


One important case of a tower of field extensions 1s that K is a given extension 
of F and a is an element of K. Then the field F(a) generated by @ is an intermediate 
field: 


(35) F C Fla) C K. 


(3.6) Corollary. Let K be an extension of F, of finite degree n. Let a be an ele- 
ment of K. Then a is algebraic over F,, and its degree divides n. 


To see this, we apply Theorem (3.4) to the fields F C F(a) C K and use the fact 
that the degree of a over F is [F(a) : F] if a is algebraic, while [F(a) : F] = ~ifa@ 
is transcendental. o 


Here are some sample applications: 


(3.7) Corollary. Let K be a field extension of F of prime degree p, and let a be an 
element of K which is not in F. Then @ has degree p over F, and K = F(a). 


For, p = [K: F] = [K: F(a)][F(q@) : F]. One of the terms on the right side is 1. 
Since a €& F, it is not the second term, so [K : F(a)] = 1 and [F(a) : F] = p. 
Therefore K = F(a). 5 


(3.8) Corollary. Every irreducible polynomial in R[x] has degree 1 or 2. 


We proved this in Chapter 11, Section 1, but let us derive it once more: Let g be an 
irreducible real polynomial. Then g has a root a in C. Since [C : R] = 2, the de- 
gree of a over R divides 2, by (3.6). Therefore the degree of g is 1 or 2. 5 


(3.9) Examples. 


(a) Leta = V2, B = W5. Consider the field L = Q(a, B) obtained by adjoin- 
ing a and B to @. Then [L : Q] = 12. For L contains the subfield Q(a), 
which has degree 3 over @, because the irreducible polynomial for a over Q is 
x°* — 2. Therefore 3 divides [L : Q]. Similarly, L contains Q(B) and B has de- 
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gree 4 over Q, so 4 divides [L : Q]. On the other hand, the degree of B over 
the field Q(a) is at most 4, because B is a root of x* — 5, and this polynomial 
has coefficients in Q(@). The chain of fields L = Q(a,B) D Q(a) D Q shows 
that [L : Q] is at most 12. So[L : Q] = 12. 


It follows by reducing modulo 2 that the polynomial f(x) = x* + 2x° + 
6x’ + x + 9 is irreducible over Q [Chapter 11 (4.3)}. Let y be a root of f(x). 
Then there is no way to express a = V2 rationally in terms of y, that is, 
a € Q(y). For [Q(a) : Q] = 3, [Q(y) : Qj = 4, and 3 does not divide 4. So 
we can’t have Q(y) > Q(a@). On the other hand, since i has degree 2 over Q, 
it is not so easy to decide whether i is in Q(y). (In fact, it is not.) o 


(b 


— 


The next two theorems state the most important abstract consequences of the 
multiplicative property of degrees. 


(3.10) Theorem. Let K be an extension of F. The. elements of K which are alge- 
braic over F form a subfield of K. 


Proof. Let a, B be algebraic elements of K. We must show that a + B, aB, 
-a, and a' (if a #0) are algebraic too. We note that since a@ is algebraic, 
[F(a) : F] < x. Moreover, B is algebraic over F, and hence it is also algebraic over 
the bigger field F(a). Therefore the field F(a, 8), which is generated over F(a) by 
B, is a finite extension of F(a), that is, [F(a, B) : F(a)] < ©. By Theorem (3.4), 
[F(a, B) : F] is finite too. Therefore every element of F(a, B) is algebraic over F 
(3.6). The elements a + B, af, etc. all lie in F(a, B), so they are algebraic. This 
proves that the algebraic elements form a field. o 


Suppose for example that a = Va, B= Vb, where a,b € F. Let us deter- 
mine a polynomial having y = a + B asa root. To do this, we compute the powers 
of y, and we use the relations a? = a, B* = b to simplify when possible. Then we 
look for a linear relation among the powers: 


y? = a? + 2aB + B* = (atb) + 2aB 
4 = (a+b) + 4(at+b)aB + 4a’°B*? = (a?+6abt+b’) + 4(at+bd)aB. 


ee 


We won’t need the other powers because we can eliminate af from these two equa- 
tions to obtain the equation y* — 2(a+b)y? + (a—b)’ = 0. Thus y is a root of the 
polynomial 

g(x) = x* — 2(a+b)x? + (a—-d)’, 


which has coefficients in F, as required. 

This method of undetermined coefficients will always produce a polynomial 
having an element such as a + as a root, if the irreducible polynomials for a and 
B are known. Suppose that the degrees of two elements a,B are d;,d2, and let 
n = d,d>. Any element of F(a, 8) is a linear combination, with coefficients in F, of 
the n monomials a‘'B/, OS i <d,, OS j <d). This is because F(a, B) = 
F [a, B] (2.6), and these monomials span F[a, 8]. Given an element y € F(a, B), 
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we write the powers 1, y, y’,..., y” as linear combinations of these monomials, with 
coefficients in F. Since there ann + 1 of the powers y’ and only n monomials a'B’, 
the powers are linearly dependent. A linear dependence relation determines a poly- 
nomial with coefficients in F of which y is a root. 

But there is one point which complicates matters. Let g(x) be the polynomial 
having y as a root which we find in this way. This polynomial may be reducible. For 
instance, it may happen that y is actually in the field F, though a, B aren’t in F. If 
so, the method we described is unlikely to produce its irreducible equation x — y. It 
is harder to determine the irreducible polynomial for y over F. o 

An extension K of a field F is called an algebraic extension, and K is said to be 
algebraic over F, if all its elements are algebraic. 


(3.11) Theorem. Let F C K CL be fields. If L is algebraic over K and K is al- 
gebraic over F, then L is algebraic over F. 


Proof. We need to show that every element a € L is algebraic over F. We 
are given that @ is algebraic over K, hence that some equation of the form 


a"+a,\a" '+- +aata=0O0 


holds, with do,...,@n-1 © K. Therefore a@ is algebraic over the field F(ao,..., dn-1) 
generated by do,...,@n—1 over F. Note that each coefficient a;, being in K, is alge- 
braic over F. We consider the chain of fields 


FC F(@) © Flaoyai) C--- C Flg@eedi,...9Gn=1) EG Gos dies an 


obtained by adjoining the elements ao,...,@n—,,a@ in succession. For each i, aj+; is 
algebraic over F(ao,..., ai) because it is algebraic over F. Also, @ is algebraic over 
F(do,a1,...,@n—1). SO each extension in the chain is finite. By Theorem (3.4), the 
degree of F(ao,@1,...,An—1,@) over F is finite. Therefore by Corollary (3.6) a is al- 
gebraic over F. o 


4. CONSTRUCTIONS WITH RULER AND COMPASS 


There are famous theorems which assert that certain geometric constructions, such 
as trisection of an angle, can not be done with ruler and compass alone. We will 
now use the concept of degree of a field extension to prove some of them. 

Here are the rules for basic ruler and compass construction: 


(4.1) 


(a) Two points in the plane are given to start with. These points are considered to 
be constructed. 


(b) If two points have been constructed, we may draw the line through them, or 


draw a circle with center at one point and passing through the other. Such lines 
and circles are then considered to be constructed. 
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(c) The points of intersection of lines and circles which have been constructed are 
considered to be constructed. 


Note that our ruler may be used only to draw straight lines through constructed 
points. We are not allowed to use it for measurement. Sometimes it is referred to as 
a “straight-edge” to make this point clear. 

We will describe all possible constructions, beginning with some familiar ones. 
In each figure, the lines and circles are to be drawn in the order indicated. 


(4.2) Construction. Draw a line through a constructed point p and perpendicular 
to a constructed line €. 


Casel: pE€ 


This construction works with any point g € € which is not on the perpendicu- 
lar. However, we had better not choose points arbitrarily, because if we do we’ll 
have difficulty keeping track of which points we have constructed and which ones are 
merely artifacts of an arbitrary choice. Whenever we want an arbitrary point, we 
will construct a particular one for the purpose. 


Case 2> pe 
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(4.3) Construction. Draw a line parallel to € and passing through p. Apply Cases 
1 and 2 above: 


(4.4) Construction. Mark off a length defined by two points onto a constructed 
line €, starting at a constructed point p € ¢. Use construction of parallels. 


length marked 
off on £ 


These constructions allow us to introduce Cartesian coordinates into the plane 
so that the two points which are given to us to start have coordinates (0,0) and 
(0, 1). Other choices of coordinate systems could be used, but they lead to equivalent 


theories. - 


We will call a real number a constructible if its absolute value |a| is the dis- 
tance between two constructible points, the unit length being the distance between 
the points given originally. 


(4.5) Proposition. A point p = (a,b) is constructible if and only if its Cartesian 
coordinates a and b are constructible numbers. 
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Proof. This follows from the above constructions. Given a point p, we can 
construct its coordinates by dropping perpendiculars to the axes. Conversely, if a 
and b are given constructible numbers, then we can construct the point p by marking 
a, b off on the two axes using (4.4) and erecting perpendiculars. o 


(4.6) Proposition. The constructible numbers form a subfield of R. 


Proof. We will show that if a and b are positive constructible numbers, then 
a + b, ab, a — b, (ifa > b), anda ' (if a # 0) are also constructible. The closure 
in case a or b is negative follows easily. 

Addition and subtraction are done by marking lengths on a line, using Con- 
struction (4.4). 

For multiplication, we use similar right triangles: 


r r’ 

Given one triangle and one side of a second triangle, the second triangle can be con- 
structed by parallels. 

To construct the product ab, we take r = 1, s = a, andr’ 

r/s = r'/s’, it follows that s’ = ab. To construct a’, we take r 


r’=1.Thens’ =a'.o 


b. Then since 
a, s = 1, and 


II 


(4.7) Proposition. If a is a positive constructible number, then so is Va. 


Proof. We use similar triangles again. We must construct them so-that r = a, 
r’ = sSondis = 1. Thkeais — r’ = Va. 

How to make the construction is less obvious this time, but we can use in- 
scribed triangles in a circle. A triangle inscribed into a circle, with a diameter as its 
hypotenuse, is a right triangle. This is a theorem of high school geometry. It can be 
checked using the equation for a circle and Pythagoras’s theorem. So we draw a cir- 
cle whose diameter is 1 + a and proceed as in the figure below. Note that the large 
triangle is divided into two similar triangles. 


Na, 1 


a 


(4.8) Proposition. Suppose four points are given, whose coordinates are in a 
subfield F of R. Let A, B be lines or circles drawn using the given points. Then the 
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points of intersection of A and B have coordinates of F, or in a field of the form 
F(V7r), where r is a positive number in F. 


Proof. The line through (dao, bo), (ai, b;) has the linear equation 
(a: — ao)(y — bo) = (bi — bo)(x — ao). 


The circle with center (ao, bo) and passing through (a,, b,) has the quadratic equa- 
tion 
(x — ao)® + (y — bo)? = (a: — Go)? +46) — Bol’. 


The intersection of two lines can be found by solving two linear equations whose 
coefficients are in F. So its coordinates are in F too. To find the intersection of a line 
and a circle, we use the equation of the line to eliminate one variable from the equa- 
tion of the circle, obtaining a quadratic equation in one unknown. This quadratic 
equation has solutions in the field F(VD), where D is the discriminant, which is an 
element of F. If D < 0, the line and circle do not intersect. 

Consider the intersection of two circles, say 


(x-—ayP+(y-—b)? =r? and ««& — am)? + y—h)P =r’, 


where a;,b;,r; © F. In general, the solution of a pair of quadratic equations in two 
variables requires solving an equation of degree 4. In this case we are lucky: The 
difference of the two quadratic equations is a linear equation which we can use to 
eliminate one variable, as before. o 


(4.9) Theorem. Let a;,..., am be constructible real numbers. There is a chain of 
subfields Q@ = Fo C F, C F, C «++ C F, = K such that 


(i) K is a subfield of R; 
(i) dunawme kK: 
(ili) for each i = O,...,n — 1, the field Fj+, is obtained from F; by adjoining the 
square root of a positive number r; € F;, which is not a square in F;. 


Conversely, let Q = Fo C F; C «++ C F, = K be a chain of subfields of R which 
satisfies (iii) . Then every element of K is constructible. 


Proof. We introduced coordinates so that the points originally given have co- 
ordinates in @. The process of constructing the numbers a; involves drawing lines 
and circles and taking their intersections. So the first assertion follows by induction 
from Proposition (4.8). Conversely, if such a tower of fields is given, then its ele- 
ments are constructible, by Propositions (4.6) and (4.7). o 


(4.10) Corollary. If a is a constructible real number, then it is algebraic, and its 
degree over @ is a power of 2. 


For, in the chain of fields (4.9), the degree of Fi+, over F; is 2, and hence 


[K : Q] = 2”. Corollary (3.6) tells us that the degree of a divides 2”, hence that it is 
a power of 2. o 
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The converse of Corollary (4.10) is false. There exist real numbers a which have de- 
gree 4 over @ but which are not constructible. We will be able to prove this later, 
using Galois theory. 

We can now prove the impossibility of certain geometric constructions. Our 
method will be to show that if a certain construction were possible, then it would 
also be possible to construct an algebraic number whose degree over @ is not a 
power of 2. This would contradict (4.10). 

Let us discuss trisection of the angle as the first example. We must pose the 
problem carefully, because many angles, 45° for instance, can be trisected. The cus- 
tomary way to state the problem is to ask for a single method of construction which 
will work for any given angle. 

To be as specific as possible, let us say that an angle 6 is constructible if its 
cosine cos 6 is constructible. Other equivalent definitions are possible. For example, 
with this definition, @ is constructible if and only if the line which passes through the 
origin and meets the x-axis in the angle @ is constructible. Or, 6 is constructible if 
and only if it is possible to construct any two lines meeting in an angle @. 

Now just giving an angle @ (say by marking off its cosine on the x-axis) pro- 
vides us with new information which may be used in a hypothetical trisection. To 
analyze the consequences of this new information, we should start over and deter- 
mine all constructions which can be made when, in addition to two points, one more 
length (= cos 6) is given at the start. We would rather not take the time to do this, 
and there is a way out. We will exhibit a particular angle 6 with these properties: 


(4.11) (i) @ is constructible, and 
(ii) 40 is not constructible. 


The first condition tells us that being given the angle 0 provides no new information 
for us: If the angle @ can be trisected when given, it can also be trisected without be- 
ing given. The second condition tells us that there is no general method of trisec- 
tion, because there is no way to trisect 0. 

The angle 6 = 60° does the job. A 60° angle is constructible because 
cos 60° = $. On the other hand, it is impossible to construct a 20° angle. To show 
this, we will show that cos 20° is an algebraic number of degree 3 over @. Then 
Corollary (4.10) will show that cos 20° is not constructible, hence that 60° can not 
be trisected. 

The addition formulas for sine and cosine can be used to prove the identity 


(4.12) cos 39 = 4 cos’ @ — 3 cos 0. 

Setting @ = 20° and a = cos 20°, we obtain the equation $5 = 4a? — 3a; or 
Sa 6a — ale "0: 

(4.13) Lemma. The polynomial f(x) = 8x* — 6x — | is irreducible over Q. 


Proof. It is enough to check for linear factors ax + b, where a, b are integers 
such that a divides 8, and b = +1. Another way to prove irreducibility is to check 
that f has no root modulo 5. o 
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This lemma tells us that a has degree 3 over Q, hence that it can not be constructed. 

As another example, let us show that the regular 7-gon can not be constructed. 
This is similar to the above problem: The construction of 20° is equivalent to the 
construction of the 18-gon. Let 6 denote the angle 2/7 and_ let 
¢=cos6+ isin@. Then ¢ is a root of the equation x®° + x° + --- + 1 =0, 
which is irreducible [Chapter 11 (4.6)]. Hence ¢ has degree 6 over Q. If the 7-gon 
were constructible, then cos 6 and sin @ would be constructible numbers, and hence 
they would lie in a real field extension of degree 2” over @, by Theorem (4.9). Call 
this field K, and consider the extension K (i). This extension has degree 2. Therefore 
[K (i) : Q] = 2”*'. But £ = cos @ + isin @ € K(i). This contradicts the fact that 
the degree of ¢ is 6 (3.6). 

Notice that this argument is not special to the number 7. It applies to any 
prime integer p, provided only that p — 1, the degree of the irreducible polynomial 
cameo. 1 ugmotiaipowen Of 2. 


(4.14) Corollary. Let p be a prime integer. If the regular p-gon can be con- 
structed by ruler and compass, then p = 2” + 1 for some integer r. 


Gauss proved the converse: If a prime has the form 2’ + 1, then the regular p-gon 
can be constructed. The regular 17-gon, for example, can be constructed with ruler 
and compass. We will learn how to prove this in the next chapter. 


5. SYMBOLIC ADJUNCTION OF ROOTS 


Up to this point, we have used subfields of the complex numbers as our examples. 
Abstract constructions are not needed to create these fields (except that the construc- 
tion of C from R is abstract). We simply adjoin complex numbers to the rational 
numbers as desired and work with the subfield they generate. But finite fields and 
function fields are not subfields of a familiar, all-encompassing field analogous to C, 
so these fields must be constructed. The fundamental tool for their construction is 
the adjunction of elements to a ring, which we studied in Section 5 of Chapter 10. It 
is applied here to the case that the ring we start with is a field F. 

Let us review this construction. Given a polynomial f(x) with coefficients in F, 
we may adjoin an element a satisfying the polynomial equation f(a) = 0 to F. The 
abstract procedure is to form the polynomial ring F [x] and then take the quotient 
ring 


(5.1) Ri Soe (7). 


This construction always yields a ring R’ and a homomorphism F ——> R’, such that 
the residue x of x satisfies the relation f(x) = 0. 

However, we want to construct not only a ring, but a field, and here the theory 
of polynomials over a field comes into play. Namely; that theory tells us that the 
principal ideal (f) is a maximal ideal if and only if f is irreducible [Chapter 1] 
(1.6)]. Therefore the ring R’ will be a field if and only if f is an irreducible polyno- 
mial. 


Section 5 Symbolic Adjunction of Roots 507 


(5.2) Lemma. Let F be a field, and let f be an irreducible polynomial in F[.x]. 
Then the ring K = F[x]/(f) is an extension field of F, and the residue ¥ of x is a 
root of f(x) in K. 


Proof. The ring K is a field because (f) is a maximal ideal. Also, the homo- 
morphism # ——> kK, which sends the elements of F to the residues of the constant 
polynomials, is injective, because F is a field. So we may identify F with its image. 
a subfield of K. The field K becomes an extension of F by means of this 
identification. Finally, ¥ satisfies the equation f(x) = 0, which means that it is a root 
of f. fa) 


(5.3) Proposition. Let F be a field, and let.f(x) be a monic polynomial in F [x] of 
positive degree. There exists a field extension K of F such that f(x) factors into linear 
factors over K. 


Proof. We use induction on the degree of f. The first case is that f has a root a@ 
in F, so that f(x) = (x — @)g(x) for some polynomial g. If so, we replace f by g, 
and we are done by induction. Otherwise, we choose an irreducible factor g(x) of 
f(x). By Lemma (5.2), there is a field extension of F. call it Fi, in which g(x) has a 
root a. We replace F by F) and are thereby reduced to the first case. o 


As we have seen, the polynomial ring F[x] is an important tool for studying 
extensions of a field F. When we are working with two fields at the same time, there 
is an interplay between their polynomial rings. This interplay doesn’t present serious 
difficulties, but instead of scattering the points which need to be mentioned about in 
the text, we have collected them here. 

Notice that if K is an extension field of F, then the polynomial ring K [x] con- 
tains F [x] as subring. So computations which are made in the ring F [x] are also 
valid in K [x]. 


(5.4) Proposition. Let f and g be polynomials with coefficients in a field F, and 
let K be an extension field of F. 


(a) Division with remainder of g by f gives the same answer, whether carried out 
in F[x] or in K [x]. 

(b) f divides g in K{x] if and only if f divides g in F[x]. 

(c) The monic greatest common divisor d of f and g is the same, whether com- 
puted in F [x] or in K[x]. 

(d) If fand g have a common root in K, then they are not relatively prime in F [x]. 
Conversely, if f and g are not relatively prime in F [x]. then there exists an ex- 
tension field L in which they have a common root. 

(e) If fis irreducible in F [x] and :f f and g have a common root in K, then f di- 
vides g in F[x]. 

Proof. (a) Carry out the division in F(x]: g = fg + r. This equation also 
holds in the bigger ring K [x]. and further division of the remainder by fis not possi- 
ble, because r has lower degree than f. or else it is zero. 
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(b) This is the case that the remainder is zero in (a). 


(c) Let d,d’ denote the monic greatest common divisors of f and g in F [x] and in 
K[x]. Then d is also a common divisor in K[x]. So d divides d’ in K[x], by 
definition of d’. In addition, we know that d has the form d = pf + qg, for some 
elements p,q © F[x]. Since d' divides f and g, it divides pf + gg = dtoo. Thus d 
and d’ are associates in K[x], and, being monic, they are equal. 


(d) Let a be a common root of f and g in K. Then x — a@ is a common divisor of f 
and g in K[x]. So their greatest common divisor in K [x] is not 1. By (c), it is not 1 
in F [x] either. Conversely, if f and g have a common divisor d of degree > 0, then 
by (5.3), d has a root in some extension field L. This root will be a common root of 


fand g. 


(e) If f is irreducible, then its only divisors in F[x] are 1, f, and their associates. 
Part (d) tells us that the greatest common divisor of f and g in F[x] is not 1. There- 
fore it is f. o 


The final topic of this section concerns the derivative f'(x) of a polynomial 
f(x). In algebra, the derivative is computed using the rules from calculus for differ- 
entiating polynomial functions. In other words, we define the derivative of x” to be 
the polynomial nx”~', and if f(x) = anx” + an-1x" ' + +++ + aix + ao, then 


(5.5) f' @) = nays) (n-ne ey, 


The integer coefficients in this formula are to be interpreted as elements of F by 
means of the homomorphism Z——> F [Chapter 10 (3.18)]. So the derivative is a 
polynomial with coefficients in the same field. It can be shown that rules such as the 
product rule for differentiation hold. 

Though differentiation is an algebraic procedure, there is no a priori reason to 
suppose that it has much algebraic significance; however, it does. For us, the most 
important property of the derivative is that it can be used to recognize multiple roots 
of a polynomial. 


(5.6) Lemma. Let F be a field, let f(x) © F[x] be a polynomial, and let a € F 
be a root of f(x). Then @ is a multiple root, meaning that (x — a)’ divides f(x), if 
and only if it is a root of both f(x) and f’(x). 


Proof. If a is a root of f, then x — a@ divides f: f(x) = (x — a)g(x). Then a 
is a root of g if and only if it is a multiple root of f. By the product rule for differen- 
tiation, 

Filaiie= (x = seaiedia) + ei) 


Substituting x = a shows that f’(@) = 0 if and only if g(a) = 0.5 
(5.7) Proposition. Let f(x) © F [x] be a polynomial. There exists a field exten- 


sion K of F in which f has a multiple root if and only if f and f’ are not relatively 
prime. 
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Proof. \f fhas a multiple root in K, then f and f’ have a common root in K by 
Lemma (5.6), and so they are not relatively prime in K. Hence they are not rela- 
tively prime in F either. Conversely, if f and f’ are not relatively prime, then they 
have a common root in some field extension K, hence f has a multiple root there. o 


Here is one of the most important applications of the derivative to field theory: 


(5.8) Proposition. Let f be an irreducible polynomial in F [x]. Then f has no mul- 
tiple root in any field extension of F unless the derivative f’ is the zero polynomial. 
In particular, if F is a field of characteristic zero, then f has no multiple root. 


Proof. By the previous proposition, we must show that f and f’ are relatively 
prime unless f ’ is the zero polynomial. Since f is irreducible, the only way that it can 
have a nonconstant factor in common with another polynomial g is for f to divide g 
(S.4e). And if f divides g, then deg g = deg f, or else g = 0. Now the degree of 
the derivative f' is less than the degree of f. So f and f’ have no nonconstant factor 
in common unless f’ = 0, as required. In a field of characteristic zero, the deriva- 
tive of a nonconstant polynomial is not zero. o 


The derivative of a nonconstant polynomial f(x) may be identically zero if F 
has prime characteristic p. This happens when the exponent of every monomial oc- 
curring in f is divisible by p. A typical polynomial whose derivative is zero in char- 
acteristic 5 is 

flix peer Serge erab em, 


where a, b,c can be arbitrary elements of F. Since the derivative of this polynomial 
is identically zero, its roots in any extension field are all multiple roots. Whether or 
not this polynomial is irreducible depends on F and on a, b,c. 


6. FINITE FIELDS 


In this section, we describe all fields having finitely many elements. We remarked in 
Section | that a finite field K contains one of the prime fields F,, and of course since 
K is finite, it will be finite-dimensional when considered as a vector space over this 
field. Let us denote F, by F, and let r denote the degree [K : F]. As an F-vector 
space, K is isomorphic to the space F'’, and this space contains p” elements. So the 
order of a finite field is always a power of a prime. It is customary to use the letter g 
for this number: 


(6.1) q =p’ =|K|. 


When referring to finite fields, p will always denote a prime integer and g a power of 
p, the number of elements, or order, of the field. 

Fields with g elements are often denoted by F,. We are going to show that all 
fields with the same number of elements are isomorphic, so this notation is not too 
ambiguous. However, the isomorphism will not be unique when r > 1. 
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The simplest example of a finite field other than the prime field F, is the field 
K = F, of order 4. There is a unique irreducible polynomial f(x) of degree 2 in 
F.{x], namely 


(6.2) fi) = +2441 


[see Chapter 11 (4.3)], and the field K is obtained by adjoining a root a of f(x) to 
F= F>: 


K = Fix|/(x? + x + 1). 


The order of this field is 4 because a has degree 2, which tells us that K has dimen- 
sion 2 as a vector space over the field F. 

The set (1,a@) forms a basis of K over F, so the elements of K are the four lin- 
ear combinations of these two elements, with mod-2 coefficients 0, 1. They are 


(6.3) {0,l,a,1 + a} = Fy. 


The element 1 + a is the second root of the polynomial f(x) in K. Computation in K 
is made using the relations 1 + 1 = 0 and a* + a + 1 = 0. Do not confuse the 
field F4 with the ring Z/(4)! 

Here are the main facts about finite fields: 


(6.4) Theorem. Let p be a prime, and let g = p’ be a power of p, with r = 1. 


(a) There exists a field of order q. 

(b) Any two fields of order gq are isomorphic. 

(c) Let K be a field of order g. The multiplicative group K* of nonzero elements 
of K is a cyclic group of order q — 1. 

(d) The elements of K are roots of the polynomial x? — x. This polynomial has 
distinct roots, and it factors into linear factors in K. 

(e) Every irreducible polynomial of degree r in F,[x] is a factor of x? — x. The ir- 
reducible factors of x? — x in F,[x] are precisely the irreducible polynomials in 
F,[x] whose degree divides r. 

(f) A field K of order g contains a subfield of order g’ = p* if and only if k di- 
vides r. 


The proof of this theorem is not very difficult, but since there are several 
parts, it will take some time. To motivate it, we will look at a few consequences 
first. 

The striking aspect of (c) is that all nonzero elements of K can be listed as 
powers of a single suitably chosen one. This is not obvious, even for the prime field 
F,. For example, the residue of 3 is a generator of F,*. Its powers 3°, 3', 3,... list 
the nonzero elements of F, in the following order: 


(6.5) F,~ = {1,3, 2,6, 4, 5}. 
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As another example, 2 is a generator of F,,*, and its powers list that group in the 
order 


(6.6) F,,* = 12,4, 8,5, 1059, 7,356}: 


A generator for the cyclic group F,* is called a primitive element modulo p. 
Note that the theorem does not tell us how to find a primitive element, only that one 
exists. Which residues modulo p are primitive elements is not well understood, but 
given a small prime p, we can find one by trial and error. 

We now have two ways of listing the nonzero elements of Fp, additively and 
multiplicatively: 


(6.7) Pe 3p 1) 2 iy? 2 pe, 
P 


where v is a primitive element modulo p. Depending on the context, one or the other 
list may be the best for computation. 

Of course, the additive group F,* of the prime field is always a cyclic group of 
order p. Both the additive and multiplicative structures of the prime field are very 
simple: They are cyclic. But the field structure of F,, governed by the distributive 
law, fits the two together in a subtle way. 

Part (e) of the theorem is also striking. It is the basis for many methods of fac- 
toring polynomials modulo p. Let us look at a few cases in which q is a power of 2 
as examples: 


(6.8) Examples. 
(a) The elements of the field F, are the roots of the polynomial 


(6.9) me ee (i (tx) 


In this case, the irreducible factors of x* — x in Z[x] happen to remain irreducible in 
F.[x]. Note that the factors of x7 — x appear here, because F, contains F,. 
Since we are working in characteristic 2, the signs are irrelevant: 


ce | execs, 
(b) The field Fs of order 8 has degree 3 over the prime field F,. Its elements are the 
eight roots of the polynomial 


(6.10) x= x = xe — DG? + x $1)? +x? + 1), in F.[e]. 


So the six elements in Fs which aren’t in F, fall into two classes: the three roots of 


x? + x + 1 and the three roots of x*° + x7 + 1. 
The cubic factors of (6.10) are the two irreducible cubic polynomials of degree 
3 in F2[x] [see Chapter 11 (4.3)]. Notice that the irreducible factorization of this 


polynomial in the ring of integers is 
(6.11) yo — x =x(x —- 1+ xe t+--+x4+ 1), in Zz]. 
The third factor is reducible modulo 2. 


To compute in the field Fz, choose a root B of one of the cubics, say of 
x°> + x + 1. Then (1, B, B’) is a basis of Fs as a vector space over F,. The elements 
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of Fs are the eight linear combinations with coefficients 0, 1: 


(6.12) Fe = 40,1,6,! + Be + 6B ees eo 
Computation in F, is done using the relation B? + B + 1 = 0. 

Note that F, is not contained in Fg. It couldn’t be, because [Fs : F.] = 3, 
{F, : F.] = 2, and 2 does not divide 3. 


(c) The field Fis: The polynomial x'® — x = x(x'° — 1) is divisible in Z[x] by 
x’? — 1 and by x° — 1. Carrying out the division ove: the integers gives this factor- 
ization: 


(6.13) x! —y= 
le a ee a 


This is the irreducible factorization in Z[x]. But in F,[x], the factor of degree 8 is not 
irreducible, and 


(6.14) x6 —y= 
x(x — 1)? +x + It + x8 + xe? + x + It + x? + «IG? + x + “*1). 


This factorization displays the three irreducible polynomials of degree 4 in F,[x]. 
Note that the factors of x* — x appear among the factors of x'® — x. This agrees 
with the fact that Fi. contains F,. 


We will now begin the proof of Theorem (6.4). We will prove the various 
parts in the following order: (d), (c), (a), (b), (e), and (f). 


Proof of Theorem (6.4d). Let K be a field of order g. The multiplicative group 
K™ has order g — 1. Therefore the order of any element a € K™ divides g — 1: 


a?! = |]: This means that a is a root of the polynomial x?~' — 1. The remaining 
element of K, zero, is a root of the polynomial x. So every element of K is a root of 
x(x?7~' — 1) = x7 — x. Since this polynomial has q distinct roots in K, it factors 


into linear factors in that field: 
(6.15) x7 —x= [] (x- 2). 
a&K 
This proves part (d) of the theorem. o 
Proof of Theorem (6.4c). By an n-th root of unity in a field F, we mean an ele- 


ment @ whose nth power is 1. Thus a is an nth root of unity if and only if it is a root 
of the polynomial 


(6.16) Dae 


or if and only if its order, as an element of the multiplicative group F”, divides n. 
The nonzero elements of a finite field with q elements are (g — 1)-st roots of unity. 
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In the field of complex numbers, the nth roots of unity form a cyclic group of 
order n, generated by 


(6.17) &s = e27/n. 


A field need not have many roots of unity. For example, the only real ones are +1. 
But one property of the complex numbers carries over to arbitrary fields: The nth 
roots of unity in any field form a cyclic group. For example, in the field K = F, of 
order 4, the group K™ is a cyclic group of order 3, generated by a. [See (6.3). ] 


(6.18) Proposition. Let F be a field, and let H be a finite subgroup of the multi- 
plicative group F”, of order n. Then H is a cyclic group, and it consists of all the nth 
roots of unity in F. 


Proof. If H has order n, then the order of an element a of H divides n, so a is 
an nth root of unity, a root of the polynomial x” — |. This polynomial has at most n 
roots, so there aren’t any other roots in F [Chapter 11 (1.18)]. It follows that H is 
the set of all nth roots of unity in F. 

It is harder to show that H is cyclic. To do so, we use the Structure Theorem 
for abelian groups, which tells us that H is isomorphic to a direct product of cyclic 


groups: 
H.~ Z/(d\) X + x Z/(dk), 

where dj |d2---|dy and n = d,--- dy. The order of any element of this product divides 

d; because d, is a common multiple of all the integers d;. So every element of H is a 

root of 


x% — J, 


This polynomial has at most d, roots in F. But H contains n elements, and 
n = d,--- dy. The only possibility is thatn = dx, k = 1, and H is cyclic. o 


Proof of Theorem (6.4a). We need to prove the existence of a field with q ele- 
ments. Since we have already proved part (d) of the theorem, we know that the ele- 
ments of a field of order g are roots of the polynomial x? — x. Also, there exists a 
field L containing F, in which this polynomial (or any given polynomial) factors into 
linear factors (5.3). The natural thing to try is to take such a field L and hope for the 
best—that the roots of x? — x form the subfield K of L we are looking for. This is 


shown by the following proposition: 
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(6.19) Proposition. Let p be a prime, and let g = p’. 


(a) The polynomial x? — x has no multiple root in any field L of characteristic p. 


(b) Let L be a field of characteristic p, and let K be the set of roots of x? — x in L. 
Then K is a subfield. 


This proposition, combined with Proposition (5.3), -proves the existence of a field 
with q elements. 


Proof of Proposition (6.19). (a) The derivative of x? — x is gx?~' — 1. In 
characteristic p, the coefficient g is equal to 0, so the derivative is equal to —1. Since 
the constant polynomial —1 has no root, x? — x and its derivative have no common 
root! Proposition (5.7) shows that x? — x has no multiple root. 


(b) Let a,B € L be roots.of the polynomial x4 — x. We have to show thata + B, 
aB, and a' (if a # 0) are roots of the same polynomial. This is clear for the 
product and quotient: If a? = a and B? = B, then (a@B)? = aB and (a"')4 = a'. It 
is not obvious for the sum, and to prove it we use the following proposition: 


(6.20) Proposition. Let L be a field of characteristic p, and let g = p’. Then in 
the polynomial ring L[x, y], we have (x + y)? = x? + y?%. 


Proof. We first prove the proposition for the case g = p. We expand (x + y)P 
in Z[x, y], obtaining 


(x eeypPseeP + (Ree y Gy a ee ee a 


by the Binomial Theorem. The binomial coefficient (2) is an integer, and if 
0 <r <p, it is divisible by p [see the proof of (4.6) in Chapter 11]. It follows that 
the map Z[x, y]——> L[x, y] sends these coefficients to zero and that (x + y? = 
ee Peay? iE |x, y]. 

We now treat the general case q = p’ by induction on r: Suppose that the 
proposition has been proved for integers less than r and thatr > 1. Letg’ = p”'. 
Then by induction, (x + y)? = (x + y)?P = (x7 + yVP = (xP + (y"P = 
LO ye a 


To complete the proof of Proposition (6.19), we evaluate x, y at a, B to con- 
clude that (a + B)? =a? + B%. Then if a%=a and B= 8, we find 
(a + B)? = a + B, as required. The case of a — B follows by substituting -B for 


Biro 


Proof of Theorem (6.4b). Let K and K’ be fields of order q, and let a@ be a 
generator of the cyclic group K*. Then K is certainly generated as a field extension 
of F = F, by the element a: K = F(a). Let f(x) be the irreducible polynomial for a 
over F, so that K ~ F[x]/(f) (2.6). Then a is a root of two polynomials: f(x) and 
x47 — x, Since f is irreducible, it divides x? — x (5.4e). We now go over to the sec- 
ond field K’. Since x? — x factors into linear factors in K’, f has aroot @’ in K’. 
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Then K ~ F[x]/(f) ~ F(a’). Since K and K’ have the same order, F(a’) = K’: 
hence K and K’ are isomorphic. o 


Proof of Theorem (6.4e). Let f(x) be an irreducible polynomial of degree r in 
F[x], where F = F, as before. It has a root a in some field extension L of F, and 
the subfield K = F(a) of L has degree r over F (3.2). Therefore K has order 
q = p’, and by part (d) of the theorem, a is also a root of x? — x. Since f is irre- 
ducible, it divides x? — x, as required. 

{n order to prove the same thing for irreducible polynomials whose degree k 
divides r, it suffices to prove the following lemma: 


(6.21) Lemma. Let k be an integer dividing r, say r = ks, and let q = p’, 
q' = p*. Then x” — x divides x? — x. 


For if f is irreducible of degree k, then, as above, f divides x?’ — x, which in turn 
divides x? — x in F[x], for any field F. 


Proof of the lemma. This is tricky, because we will use the identity 


twice. Substituting y = q’ and d = s shows that q' — 1 divides g — 1 = q'® — 1. 
Knowing this, we can conclude that x7~' — 1 divides x7~' — 1 by substituting 


y = x7"! and d = (q — 1)/(q' — 1). Therefore x” — x divides x? — x too. 0 


So we have shown that every irreducible polynomial whose degree divides r is 
a factor of x? — x. On the other hand, if f is irreducible and if its degree k doesn’t 
divide r, then since [K : F] = r, f doesn’t have a root in K, and therefore f doesn’t 
divide x?7— x. a 


Proof of Theorem (6.4 f). If k does not divide r, then g = p’ is not a power of 
q' = p*, soa field of order g can not be an extension of a field of order q’. On the 
other hand, if k does divide r, then Lemma (6.21) and part (d) of the theorem show 
that the polynomial x? — x has all its roots in a field K of order g. Now Proposition 
(6.19) shows that K contains a field with q’ elements. o 

This completes the proof of theorem 6.4. 


7, FUNCTION FIELDS 


In this section we take a look at function fields, the third class of field extensions 
mentioned in Section 1. The field C(x) of rational functions in one variable x will be 
denoted by F throughout the section. Its elements are fractions g(x) = p(x)/q(x) of 
polynomials p,q € C[x], with q # 0. We usually cancel common factors in p and q 
so that they have no root in common. 

Let us use the symbol P to denote the complex plane, with the complex coordi- 
nate x. A rational function g = p/q determines a complex-valued function of x, 
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which is defined for all x © P such that g(x) # 0, that is, except at the roots of the 
polynomial g. Near a root of g, the function defined by g tends to infinity. These 
roots are called poles of g. (We usually use the phrase “rational function” to mean an 
element of the field of fractions of the polynomial ring. It is unfortunate that the 
word function is already there. This prevents us from modifying the phrase in a nat- 
ural way when referring to the actual function defined by such a fraction. The termi- 
nology is ambiguous, but this can’t be helped.) 

A minor complication arises because formal rational functions do not define 
functions at certain points, namely at their poles. When working with the whole 
field F, we have to face the fact that every value a of x can be a pole of a rational 
function, for example of the function (x — a) '. There is no way to choose a com- 
mon domain of definition for all rational functions at once. Fortunately this is not a 
serious problem, and there are two ways to get around it. One is to introduce an ex- 
tra value © and to define g(a) = ~ if a@ is a pole of g. This is actually the better way 
for many purposes, but for us another way will be easier. It is simply to ignore bad 
behavior at a finite set of points. 

Any particular computations we may make will involve finitely many func- 
tions, so they will be valid except at a finite set of points of the plane P, the poles of 
these functions. A rational function is determined by its value at any infinite set of 
points. This is proved below, in Lemma (7.2). So we can throw finite sets out of the 
domain of definition as needed, without losing control of the function. Since a ratio- 
nal function is continuous wherever it is defined, we can recover its value at a point 
Xo which was thrown out unnecessarily, as 


oD) 8 (x0) = — g (x). 


(7.2) Lemma. __If two rational functions f,, f: agree at infinitely many points of the 
plane, then they are equal elements of F. 


Proof. Say that fi; = pi/qi, where pi,gi © C[t]. Let h(x) = piqo — progr. If 
h(x) is the zero polynomial, then f, = fo. If h(x) is not zero, then it has finitely 
many roots, so there are only finitely many points at which f, = fo. o 


In order to formalize the intuitive procedure of ignoring trouble at finite sets of 
points, it is convenient to have a notation for the result of throwing out a finite set. 
Given an infinite set U, we will denote by U’ a set obtained from U by deleting an 
unspecified finite subset, which is allowed to vary as needed: 


(7.3) U' = U — (variable finite set). 


By a function on U' we mean an equivalence class of complex-valued func- 
tions, each defined except on a finite subset of U. Two such functions f, g are called 
equal on U’ if there is a finite subset A of U such that f and g are defined and equal 
on U — A. (We could also refer to this property by saying that f = g almost every- 
where on U. However, in other contexts, “almost everywhere” often means “except 
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on a set of measure zero,” rather than “except on a finite set.”) A function fon U' 
will be called continuous if it is represented by a continuous function on some set 
a 

The set of continuous functions on U' will be denoted by 


(7.4) %(U) = {continuous functions on U'}. 


This set forms a ring, with the usual laws of addition and multiplication of functions: 
(7.5) [f + gl(x) = f(x) + g(x) and [fg](x) = f(x)ge(x). 


Lemma (7.2) has the following corollary: 


(7.6) Proposition. The field F = C(x) is isomorphic to a subring of the ring 
#(P), where P is the complex plane. o 


Let us now examine one of the simplest function fields in more detail. We are 
going to need polynomials with coefficients in the field F. Since the symbol x has al- 
ready been assigned. we use y to denote the new variable. We will study the 
quadratic field extension K obtained from F by adjoining a root of f(y), where 
f = y’? — x. Since f depends on the variable x as well as on y, we will also write 


(7.7) a ae 


The polynomial y* — x is an irreducible element of F[y], so K can be constructed as 
the abstract field F[y]/(/). The residue of the variable y is a root of fin K. 

The importance of function fields comes from the fact that their elements can 
be interpreted as actual functions. In our case, we can define a square root function 
h, by choosing one of the two values of the square root for each complex number 
xih(x) = Vx. Then h can be interpreted as a function on P’. However, since 
there are two values of the square root whenever x # 0, we need to make a lot of 
choices to define this function. This isn’t very satisfactory. If x is real and positive, 
it is natural to choose the positive square root, but no choice will give a continuous 
function on the whole complex plane. 

The locus S of solutions of the equation y* — x = 0 in C? is called the 
Riemann surface of the polynomial y* — x (see Section 8 of Chapter 10). It is de- 
picted below in Figure (7.9), but in order to obtain a surface in real 3-space, we 
have dropped one coordinate. The complex two-dimensional space C? is identified 
with R* by the usual rule (x,y) = (x0 + x, yo + yri)<— (%0, x1, Yo, v1). The 
figure depicts the locus 


(7.8) {(xo, x1, yo) | yo = real part of (xot+x1i)'/7}. 


This is a projection of S from R* to R’. 
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(7.9) Figure. The Riemann surface y? = x. 


The Riemann surface § does not cut itself along the negative xo-axis as the projected 
surface does. Every negative real number x has two purely imaginary square roots, 
but the real parts of these square roots are zero. This produces the apparent self- 
crossing in the projected surface. Actually, S is a two-sheeted branched covering of 
P, as defined in Chapter 10 (8.13), and the only branch point is at x = 0. 

Figure (7.9) shows the problem encountered when we try to define the square 
root as a single-valued function. When x is real and positive, the positive square root 
is the natural choice. We would like to extend this choice continuously over the 
complex plane, but we run into trouble: Winding once around the origin in complex 
X-space brings us back to the negative square root. It is better to accept the fact that 
the square root, as a solution of the equation y* — x = 0, is a multi-valued function 
on P’. 

Now there is an amazing trick which will allow us to solve any polynomial 
equation f(x,y) = 0 with a single-valued function, without making arbitrary 
choices. The trick is to replace the complex plane P by the Riemann surface S, the 
locus f(x, y) = 0. We are given two functions on S, namely the restrictions of the 
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coordinate functions on C°. In order to keep things straight, let us introduce new 
symbols for these functions, say X, Y: 


(7.10) X(x,y) =x and Y(x,y) = y, for (x,y) ES. 


These restrictions of the coordinate functions to S are related by the equation 
J(X. Y) = 0, because by definition of S, f(x, y) = 0 at any point of S. 


(7.11) Proposition. Let f(x,y) be an irreducible polynomial in C[x, y] which is 
not a polynomial in x alone, and let S = {(x,y) | f(x, y) = O} be its Riemann sur- 
face. Let K = F[s]/(f) be the field extension defined by f. Then K is isomorphic to 
a subring of the ring #(S) of continuous functions on S’. 


Proof. Let g(x) be a rational function. Since X is the restriction of a coordi- 
nate function on C*. the composed function g(X) is continuous on S except at the 
points which lie above the poles of g. There are finitely many such points [Chapter 
10 (8.11)]. So g(X) is a continuous furction on S’. We define a homomorphism 
F ——> #(S) by sending g(x) to g(X). Next, the Substitution Principle extends this 
map to a homomorphism 


(7.12) g: Fly]—> HS), 

by sending y~~~ Y. Since f(X, Y) = 0, the polynomial f(x, y) is in the kernel of ¢. 
Since K = F[y]/(f), the mapping property of quotients [Chapter 10 (4.2)] gives us 
a map ¢~: K —~> ¥#(S) which sends the residue of y to Y. Since K is a field, @ 1s 
injective. o 


(7.13) Definition. An isomorphism of branched coverings S;, S52 of the plane P is a 
homeomorphism gy’: S,'——> S2' which is compatible with the maps a: $i —— P, 
that is, such that m2’ = 7m’: 


ce ee 
per 


By this we mean that o’ is defined except on a finite set of S, and that when suitable 
finite sets are omitted from S, and $2, @’ is a homeomorphism. 


A branched covering S is called connected if the complement S' of an arbitrary 
finite set of S is a path-connected set. 

We will now state a beautiful theorem which describes the finite extensions of 
the field of rational functions. Let €, denote the set of isomorphism classes of exten- 
sion fields K of F of degree n. Let €, denote the set of isomorphism classes of con- 
nected n-sheeted branched coverings. 77: S—— P of the plane. 


(7.14) Theorem. Riemann Existence Theorem: There is a bijective map 
®,: €n—— €n. If K is the extension obtained by adjoining a root of an irreducible 
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polynomial f(x, v) © CLx, y], then the class of branched coverings corresponding to 
K is represented by the Riemann surface of f. o 


The proof of this theorem is a suitable topic for a course in complex variables. 
It requires too much analysis to give here. Using it, however, we can associate a 
branched covering of the plane, unique up to isomorphism, to every finite extension 
field K of F. This covering is called the Riemann surface of the extension field K. 
The Riemann surface of F is the complex plane P itself. 

Here are two striking corollaries of the theorem: 


(7.15) Corollary. Given a connected n-sheeted branched covering S of the plane, 
there is a polynomial f(x, y) of degree n in y whose Riemann surface is isomorphic 
to S. 


This follows from the surjectivity of the map ®, and from a fact which will be 
proved in the next chapter [Chapter 14 (4.1)], that every finite extension K of F can 
be obtained by adjoining a single element. on 


(7.16) Corollary. Let f, g be irreducible polynomials in C[x, y], with Riemann 
surfaces S, T. Let a be a root of f(y) in an extension field of F. If S and T are isomor- 
phic branched coverings, then g(y) has a root in F(a). 


This follows from the injectivity of the map ®,. o 


Visualization of Riemann surfaces is complicated by the fact that they are em- 
bedded in C’, a four-dimensional real space. One aid to constructing and visualizing 
them is a method known as cut and paste. If we cut the surface y* — x open along 
the negative real axis, the double locus in Figure (7.9), then it decomposes into the 
two parts re Y > OQOand_ re Y < 0. Each of these parts projects to the x-plane P in 
a bijective way, if we disregard what happens along the cut. Turning this procedure 
around, we can construct a surface which is homeomorphic to S in the following 
way: We stack two copies P,P: of the complex plane over P and cut them open 
along the negative real axis (-©,0]. These copies of P are called sheets. Then we 
glue side A of P; to side B of P. and vice versa (see below). Four dimensions are 
needed to embed S§ without crossings. 


side A of cut 
side B of cut 


(7.17) Figure. 


To construct a general branched covering S of the plane by the cut-and-paste 
procedure, we begin with n copies of the plane P, called sheets. The sheets are la- 
belled P;,..., Pn and are stacked up over P. We also select a finite set of points 
a,,...,@, Of P to be branch points. For each branch point a,, we choose a curve C, 
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beginning at a, and going to infinity in an arbitrary direction. This should be done 
in such a way that the curves C, do not intersect. The sheets P, are cut open along 
these curves. Then various sheets are glued to others along opposite edges of the 
cuts. 

To describe the resulting covering S$, we need only describe the permutations 
ay by which the sheets are glued together along the cuts. To be specific, we draw a 
small loop €, around the point a, in the counterclockwise direction. Then if the per- 
mutation o, sends the index 1 to 3, we glue sheet P, to sheet P; as we cross C,. This 
means that if we start on sheet P, and wind once around the loop €,, we return on 
sheet P;. The permutation a, can be arbitrary. 


& 


The points a, are called branch points of the surface S because the surface de- 
composes into n disjoint sheets near any other point of P. It won’t have n disjoint 
sheets above the point @, unless the permutation oa, is the identity. If o, = 1, then 
each sheet is glued back to itself along the cut C,, so that cut was not needed. But it 
is convenient to allow this as a possibility. Let’s call @, a true branch point if 
ao, # 1. Some of the points a, may not be true branch points. However, all true 
branch points are among them. 

It is important to note that the numbering of the sheets is arbitrary and, in par- 
ticular, that the concept of a “top sheet” has no intrinsic meaning for the Riemann 
surface of a polynomial. If there was a top sheet, we could define y as a single- 
valued function by choosing the value on that sheet. One can do this only once the 
Riemann surface has been cut open. This is the whole point; wandering around on 
the surfaces will lead us from one sheet to another. 

It is not difficult to decide when two such branched coverings are isomorphic. 


(7.18) Proposition. Let S,7 be branched coverings which are constructed as 
above, with the same branch points a, and the same curves C,, but using different 
sets of permutations (a;,...,0,) and (7;,...,7,). Then S and T are isomorphic cover- 
ings if and only if the two sets of permutations are conjugate, that is, if and only if 
there is a permutation p such that 7, = p ‘ovp for all v. 
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Proof. Let a,C stand for o,,C,. Our rule is that P; is glued to Pic along C. 
Suppose that we relabel the sheets P,,..., Pn, changing the numbers by a permuta- 
tion p. To keep old and new labellings straight, let’s label the renumbered sheets as 
Q,. So for every i, P, is relabelled as Q,p. The rule now tells us to glue Pi = Qip to 
Qiop = Pic. Substituting i = jp ' shows that the rule glues Q, to Qjp-!apy. Thus the 
permutation which describes this gluing rule is the conjugate p 'o,p of the old per- 
mutation g,. Since the covering is not changed by the relabelling process, this 
shows that a conjugate set of permutations defines an isomorphic covering. 

Conversely, let ¢: S——>T be an isomorphism of coverings. Let P;,..., Pn be 
the sheets which are used to construct S, and let Q),..., Qn be those used to construct 
T. Then since P; is connected and since T, when cut open, is a disjoint union of the 
open sets Q,, the image of P; must be contained in a single sheet Q;. Since ¢ is com- 
patible with the projections to P, which are homeomorphisms except on the cuts, the 
restriction of g to P; must be a bijection onto the sheet Q;. So we can renumber the 
sheets Q; so that P; is mapped to Q;. This changes the permutations 7, to conjugates, 
as above. So we may assume that ¢ carries P; to Q;. Also, ¢ is continuous across the 
cuts. Therefore if crossing the cut Cy on sheet P; leads to P;, then, similarly, cross- 
ing on Q; must lead to Q;. Therefore a» = Ty. 5 


We can also start with an arbitrary branched covering S and reconstruct it in 
this way: Say that S is branched at the points a,,...,a, © P. As above, we choose 
nonintersecting curves C, beginning at a, and going to infinity. Then if S is cut open 
above the curves C,, it decomposes into n sheets. This is a theorem of topology, be- 
cause the complement of the curves C; in P is simply connected [Munkres, Topology 
(p. 342, exc. 8]. Therefore a covering homeomorphic to S can be reconstructed from 
n Sheets P;,..., Pn by cutting them open along the curves and gluing together to mix 
up the sheets. 

We will now describe the Riemann surfaces of a few simple polynomials f. 
This is usually difficult to do when f is complicated. 


(7.19) Example. The Riemann surface of y* — x: Here y represents a cube root of 
x, and S is a three-sheeted covering of P. The only branch point is x = 0. We cut S 
open above the positive real axis C = [0,]. This decomposes S into three sheets 
P,, P2,P3, and it is reasonable to guess that the gluing along the cut is done by a 
cyclic permutation. 

This case is fairly easy to analyze because x is a single-valued function of y. 
Because of this, we can interpret S as the graph of a function from y- space to 
x-space, which implies that the projection of S onto the complex y-plane is bijective. 
We identify S with the y-plane using this projection and cut it open above C. This 
will decompose the plane into three parts corresponding to the sheets P;. The rules 
for gluing will be evident when this decomposition is made See 

The values of y lying over the cut C are those for which y* = x is real and pos- 
itive. They are y = re'®, where 6 = 0, 27/3, or Air /3. So the sheets are sectors. 
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FP 


Bs 


¥ — plane y— plane 


In the eee. the sectors have been numbered arbitrarily. Note that under the map 
yew vy" = x, each of the three sectors is stretched radially and maps bijectively to 
the entire plane, disregarding the cut. As we move along S to cross the cut in the 
x-plane, we also cross one of the three cuts in the y-plane. As predicted, this per- 
mutes the sheets by the cyclic permutation (123). o 


(7.20) Example. 


The Riemann surface of f(x, y) = y* — 3y — x: The points x at which this polyno- 
mial has fewer than three roots are found by solving the equations f = df/dy = 0 
[see Chapter 10 (8.12)]. Here df/dy = 3(y° — 1). So the solutions are y = +1, 
and hence x = +2. We may cut S open above the curves C, = (—%,-—2] and 
C, = [2,%), to decompose it into three sheets. 

Again, x is a single-valued function of y, and we can analyze the gluing of the 
sheets by cutting the y-plane apart suitably. To do so, we ask for the values of y such 
that x lies on one of the curves C,. Since these curves are on the real x-axis, we be- 
gin by solving the equation imx = 0. Setting y= u-+ vi, we find imx = 
im(y? — 3y) = v(3u? — v? — 3). The solutions are the u-axis v = 0 and the two 
branches of the hyperbola 3u’° — v* = 3. The points on the u-axis in the interval 
(—2, 2) correspond to x € (-2, 2), so they do not lie over the cuts. 


P3 Py Ms P l 


x — plane y — plane 
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Again, each of the three regions into which the y-plane decomposes is mapped 
bijectively to the x-plane by the function y’ — 3y, disregarding the cut as always. In 
the figure, the dotted curves are those which lie over C;. The figure shows that mov- 
ing on S to cross the curve (—%, —2] interchanges the sheets P;, P2, leaving P; alone, 
and similarly that crossing above [2,) interchanges P,, P;. So the branching is de- 
scribed by the transposition (23) at the branch point x = -2 and by (12) atx = 2.0 


(7.21) Example. The Riemann surface of y? — x* + x’: There are two points 
x = 0,1 above which S has fewer than two points. However, at x = 0 the sheets 
cross without getting mixed up, so the only true branch point is x = |. To see this 
we make the change of variable x = x, z = y/x, which is defined and invertible ex- 
cept at x = 0. Then z* — x + 1 = 0. The given surface S becomes homeomorphic 
to the Riemann surface of z* — x + | when the points above the origin are deleted, 
and the surface can be reduced to (7.9) by a translation in the x-plane. a 


When it is not possible to solve for x as a single-valued function of y, the prob- 
lem of describing the gluing data becomes more difficult. We will work out one ex- 
ample of this type. 


(7.22) Example. The Riemann surface of y? — (x* — x): There are three points at 
which x* — x = 0, namely x = 0, +1, and the surface has three branch points at 
which it behaves like the Riemann surface of y* — x at the origin. Our systematic 
procedure is to make cuts from these three branch points to infinity, but in this case 
another choice of cuts is easier to analyze. The values of x such that y is purely imag- 
inary are the real x such that x? — x < 0. These are the points in the two intervals 
(-0, -1] and [0, 1]. If we cut S open along these two intervals, it will decompose 
into the parts rey > Qand_ rey < 0. Thus we can reconstruct the surface S by 
stacking up two copies of P, cutting them open along the intervals and gluing to mix 
up the sheets as before. 


i) a 
(7.23) Figure. 


The fact that a surface constructed by the cut-and-paste method crosses itself 
along the cuts makes it confusing to visualize directly. But since the cuts are along 
the real axis in this example, we can avoid crossings by turning one of the sheets 
over. This ruins the representation of S as a double covering of P, but the advantage 
is that the sheets are now giued along the same side of the cut. There are two such 
cuts in Figure (7.23). Turning one sheet over and stretching to pull the slits apart be- 
fore gluing results in the following picture: This Riemann surface is homeomorphic 
to a torus with one point deleted. o 
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8. TRANSCENDENTAL EXTENSIONS 


In this section we will take a brief look at some transcendental field extensions. We 
saw in Propositon (2.5) that the structure of the field extension F(a) generated by a 
single transcendental element a over a field F does not depend on the element a. But 
if two transcendental elements a, B are adjoined at the same time, the structure of 
the field F(a, 8) which is obtained will depend on whether or not the elements a and 
B are algebraically related, and if they are related, the structure will depend on the 
nature of this relation. For example, a = Va and B = Va V7 — | are transcen- 
dental numbers over @, which are related by the equation 


p= a Fa. — 0. 


In general, we call a set of elements {a,..., an} of an extension field K D F 
algebraically dependent over F if there is a nonzero polynomial in n variables 
f(x1,---,Xn) © F[x,..., Xn] such that 


f(q@i,...,Qn) = 0, 


and we call them algebraically independent over F if there is no such polynomial. 
Thus V7 and Wa Va — | are algebraically dependent over Q. It is conjectured 
that e and 7 are algebraically independent, but this has not been proved. 

We can interpret algebraic independence in terms of the substitution homo- 
morphism g: F[x1,...,x,]——>K sending f(x,...,%n)~~~>f(ai,...,@n). The ele- 


ments @,...,@n are algebraically independent if ker g = 0, that is, if ¢ is injective, 
and algebraically dependent otherwise. Passing to fields of fractions gives this 
proposition: 


(8.1) Proposition. If a,,...,an are algebraically independent, then F(a;,..., an) is 
isomorphic to the field F(x1,..., Xn) of rational functions in x,,..., xn, the field of 


fractions of F[x,...,%n]. 0 


An extension of the form F(a,...,@n), where a; are algebraically independent, is 
called a pure transcendental extension. 


(8.2) Definition. A transcendence basis for a field extension K of F is a set of ele- 
ments (@;,...,@n) which are algebraically independent and such that K is an alge- 
braic extension of the field F(a1,..., an). 
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(8.3) Theorem. Let (a),...,@m) and (B:,..., Bn) be elements in an extension K of 
a field F. Assume that K is algebraic over F(81,...,Bn) and that a,,...,@m are alge- 
braically independent over F. Then m <n, and (a,...,@m) can be completed to a 
transcendence basis for K by adding (n — m) of the elements B;. 


We leave the proof of this theorem as an exercise. o 


(8.4) Corollary. Any two transcendence bases for an extension F C K have the 
same number of elements. o 


(8.5) Definition. The transcendence degree of K is the number of elements in a 
transcendence basis, or is infinite if no finite transcendence basis exists. 


(8.6) Examples 


(a) The fields F(x,..., x.) of rational functions in n variables are not isomorphic ex- 
tensions of F for different values of n, because (x),..., Xn) iS a transcendence basis. 


(b) Let a, B be as at the beginning of the section. The single element zr forms a 
transcendence basis for K = Q(a,8) over Q. Therefore (8.3) implies that, as was 
asserted above, any two elements of K are algebraically dependent. The element B 
is another transcendence basis. 


(c) Consider any two polynomials or rational functions in one variable f, g © F(x). 
There is a nonzero polynomial ¢(y,z) € F[y,z] such that p(f, g) = 0. For, the 
transcendence degree of F(x) is 1, and hence f, g are algebraically dependent. 


Most field extensions‘aren’t pure transcendental, though this may be difficult tc 
decide for a particular extension. Here are two examples: 


(8.7) Proposition. 


(a) The function field L = C(x)Ly]/(y* — x’) is a pure transcendental extension of 
C. It is the field of rational functions in t = y/x. 

(b) The function field K = C(x)[y]/(y* — x* + x) is not a pure transcendental ex- 
tension of C. That is, there is no element t € K such that K = C(t). 


Proof. In both cases, the transcendence degree of K over C is 1, because x is a 
transcendence basis. 


(a) Lett = y/x. Then C(t) C L because t € L. Now L is generated by x and y, by 
definition. On the other hand, x = t? and y = t*. Therefore L = C(t). Since K has 
transcendence degree 1, (8.4) shows that ¢ is transcendental. 


(b) (Sketch) To show that K is not a field of rational functions, we appeal to the ge- 
ometry of its Riemann surface. We saw in the last section that this surface is a torus 
from which one point has been deleted. On the other hand, the Riemann surface of 
the field of rational functions C(t) is the complex plane itself. Now, it is a theorem 
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of topology that the torus and the plane are not homeomorphic and that they do not 
become homeomorphic when finite sets are deleted. If we admit this theorem, then 
the next proposition will complete the proof. 


(8.8) Proposition. Let K = C(x)[y]/(f) and L = C(t)[u]/(g) be function fields 
with Riemann surfaces S, 7 respectively. A homomorphism gy: L—— K which is the 
identity on the subfield C induces a map »*: S'——> T between their Riemann sur- 
faces, which is defined and continuous except on a finite set of points of S. If g is an 
isomorphism, then g* becomes a homeomorphism when suitable finite sets are 
deleted from S and T. 


Note that the map y* goes from the Riemann surface of K to that of L, in the oppo- 
site direction from ¢. 


Proof. The Riemann surface T is the locus g(t,u) = 0 in C*. According to 
Proposition (7.11), every element a € K defines a continuous function on S’, so 
the pair of functions (g(t), @(u)) defines a continuous map S’——~C?’. Since 
g(t,u) = 0 in L and since ¢ is a homomorphism which leaves the coefficients of g 
fixed, g(¢(t), p(u)) = 0 too. So S’ is mapped to T. This is the required map ¢*. If 
y is an isomorphism, its inverse defines a map T’——~ S which ts an inverse func- 
tion to g* on the complement of a finite set. o 


9, ALGEBRAICALLY CLOSED FIELDS 


A field F is said to be algebraically closed if every polynomial f(x) € F [x] of posi- 
tive degree has a root in F. The fact that the field C of complex numbers is alge- 
braically closed is called the Fundamental Theorem of Algebra. 


41) Theorem. Fundamental Theorem of Algebra: Every nonconstant polynomial 
with complex coefficients has a complex root.” ~~~ 


We have used this theorem often already. A proof is at the end of the section. 

If a field F is algebraically closed, then every nonconstant polynomial 
f(x) © F[x] has a linear factor x — a@, so the only irreducible polynomials are the 
linear ones. Consequently every polynomial is a product of linear factors. Also, 
there are no algebraic extensions of F other than F itself (whence the phrase alge- 
braically closed). For if @ is algebraic over F, then it is a root of a monic irreducible 
polynomial f(x) € F[x]. This polynomial must have the form x — a, soa € F. 

It may be convenient to think of a field F which is being studied as a subfield 
of an algebraically closed field. For instance, we like to think of number fields as 
subfields of C. Let us call an extension field K of F an algebraic closure of F if 


(9.2) (i) K is algebraic over F, and 
(ii) K is algebraically closed. 
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(9.3) Corollary. Let F be a subfield of C. The subset F of C consisting of all 
numbers which are algebraic over F is an algebraic closure of F. 


Proof. The fact that F is a field has been proved (3.10). To show that F is al- 
gebraically closed, let f(x) € F[x] be a nonconstant polynomial. Then f(x) has a 
root a in C, and F(a) is algebraic over F. Since F is algebraic over F, a is algebraic 
over F by (3.11). Sow € F.a 


It is not hard to construct an algebraic closure of a finite field F,, as a union of 
the fields F,, where g = p’ is a power of p. To do this, we choose a sequence of 
integers r,,/2,... with these properties: (i) r; divides r;+,, and (11) every integer n di- 
vides some r;. We may take r; = i!, for example. We set qi = p” and F; = F,,. It 
follows from (i) that Fj+, contains a subfield isomorphic to F; (6.4), so we can build 
a tower of fields F, C F, C --:. Let F be the union of this chain of fields. Then (ii) 
tells us that every finite field Fz, g = p”, is isomorphic to a subfield of some F;, 
hence to a subfield of F. This field is an algebraic closure of F,. 

The following theorem can be proved using Zorn’s Lemma. 


(9.4) Theorem. Every field F has an algebraic closure, and if K,, Kz are two alge- 
braic closures of F, there is an isomorphism ¢: K, > K2 which is the identity map 
on the subfield F. o 


Thus the algebraic closure is essentially unique. 


(9.5) Corollary. Let F be an algebraic closure of F,, and let K be any algebraic ex- 
tension of F. There is a subextension K’ C F isomorphic to K. a 


Proof of the Fundamental Theorem of Algebra. To show that f(xo) = 0, it is 
enough to show that the absolute value | f(xo)| is zero. The existence of such a value 
Xo € C is proved by the following two lemmas: 


(9.6) Lemma. Let f(x) be a nonconstant polynomial, and let x» © C be a point at 
which f(xo) # 0. Then | f(xo)| is not the minimum value of | f(x) |. 


(9.7) Lemma. Let f(x) be a complex polynomial. Then | f(x)| takes on a mini- 
mum value at some point xo € C. 


Proof of Lemma (9.6). We first note that the polynomial x* — c has a root for 
all c € C. A nonnegative real number r has a real kth root because the continuous 
function x*, which is zero when x = 0 and large when x is a large real number, 
takes on all real values = 0, by the Intermediate Value Theorem. We write the com- 
plex number c in the form c = re’®, where r = |c| and @ = arg c. Let s be a real 
kth root of r. Then the required kth root of c is 


(9.8) a = se!8/k. 


Now let f(x) be a nonconstant polynomial, and let x» € C be a point at which 
f (xo) # 0. It is convenient to normalize f. We make a change of variable, replacing 


Section 9 Algebraically Closed Fields 529 


x by x + xo, to shift the point in question to the origin: xo = 0. We also multiply 
(x) ‘i f(0)"'. Then f(0) = 1, and we must show that 1 is not the minimum value of 
I). 


Let k denote the lowest nonzero power of x occurring in f, so that 
f(x) = 1 + ax* + (terms of degree > k). 


Let a be a kth root of -a~'. We make a final change of variable, replacing x by ax. 
Then f takes the form’ 


f(x) = 1 — x* + (higher-degree terms) = 1 — x* + x**19(,), 


for some polynomial g(x). For small positive real x, the Triangle Inequality shows 
that 


| Fy le] 1 — x*| + |x” eG) | = 1 — x* + x**'| 2(x)| = 1 — x*(1 — x] 2(x))). 


Since x| g(x)| is small for small x, the term x*(1 ~ x| g(x)|) is positive when x is a 
sufficiently small positive real number. For such x, | f(x)| < | f(0)|. a 


Proof of Lemma (9.7). We may assume that f is not a constant polynomial. For 
large x, f(x) is also large: 


(9.9) | f(x)|—> & as |x| —> ». 


To prove this, the constant term of f is irrelevant, so we may suppose that it is zero. 
Then f(x) is divisible by x: f(x) = xg(x). By induction on the degree, the assertion 
is true for g(x), or else g(x) is constant, and it follows for f(x) as well. 

Now since f(x) is large for large x, the greatest lower bound m of | f(x) | in the 
whole complex plane is also the greatest lower bound in a sufficiently large disc 
|x| <r. Since the disc is compact and | f(x)| is a continuous function, it takes on a 
minimum value in the disc. o 


There are several other proofs of the Fundamental Theorem of Algebra, and 
one of them is particularly appealing, though it is not as easy to make precise as the 
one just given. We will present it in outline. As before, our problem is to prove that 
a nonconstant polynomial 


(9.10) f(ee= 2" + aac” | + + + az F ap 


has a root. If a = 0, then 0 is a root, so we may assume that a) # 0. We consider 
the function f: C—— C defined by the polynomial (9.10). 

Let C, denote a circle of radius r about the origin. We study the i peers W(C,) 
of the circle C,. To do this, we use polar coordinates, writing z = re’®. Then z” = 
r"e'®| As @runs from 0 to 277, the point z winds once around the circle of radius r. 
At the same time, n@ runs from 0 to 27rn, so the point z” winds n times around the 
circle of radius r”. 

For sufficiently large r, the term z” is dominant in the expression (9.10), and 
we will have 


| f@) ~ 2"| Sar" 
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The proof of this fact is similar to the proof of Lemma (9.6). For our purposes, the 
factor 4 could be replaced by any positive real number less than 1. This inequality 
shows us that, as z” winds n times around the circle of radius r”, f(z) also winds n 
times around the origin. A good way to visualize this conclusion is with the dog-on- 
a-leash model. If someone walks a dog n times around the block, the dog also goes 
around # times, though following a different path. This will be true provided that the 
leash is shorter than the radius of the block. Here z” represents the position of the 
person at the time @, and f(z) represents the position of the dog. The length of the 
leash is 5r”. 

We now vary the radius r. Since f is a continuous function, the image f(C,) 
will vary continuously with r. When the radius r is very small, f(C-) makes a small 
loop around the constant term ao of f. This small loop won't wind around the origin 
at all. But as we just saw, f(C,) winds 7 times around the origin if r is large enough. 
The only explanation for this is that for some intermediate radius r’, f(C,’) passes 
through the origin. This means that for some point @ on the circle C,’, f(a) = 0. 
This number a is a root of f. 

Note that all n loops have to cross the origin, which agrees with the fact that a 
polynomial of degree n has n roots. 


I don’t consider this algebra, 
but this doesn’t mean that algebraists can’t do it. 


Garrett Birkhoff 


EXERCISES 
1. Examples of Fields 


1. Let F be a field. Find all elements a € F such that a = a“. 
2. Let K be a subfield of C which is not contained in R. Prove that K is a dense subset of C. 


3. Let R be an integral domain containing a field F as subring and which is finite-dimen- 
sional when viewed as vector space over F. Prove that R is a field. 


4. Let F be a field containing exactly eight elements. Prove or disprove: The characteristic 
of F is 2. 


2 Algebraic and Transcendental Elements 


I. Let @ be the real cube root of 2. Compute the irreducible polynomial for 1 + a? over Q. 
2. Prove Lemma (2.7), that (1,a,a’,...,@”"') is a basis of F[a]. 
3. Determine the itreducible polynomial for @ = V3 + V5 over each of the following 
fields. 
(a) Q (b) Q(V5) (©) Q(V10) (@) Q(V15) 
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4. 


5: 


m5 


15. 


Let a be a complex root of the irreducible polynomial x? — 3x + 4. Find the inverse of 
a’ + a + 1 in F(q) explicitly, in the form a + ba + ca”, a,b,c € Q. 

Let K = F(a), where a@ is a root of the irreducible polynomial f(x) = 
xX" + nix"! + +++ + ayx + ao. Determine the element a! explicitly in terms of a@ 
and of the coefficients a;. 


. Let B = £W2, where ¢ = e?"/> and let K = Q(B). Prove that —1 can not be written as 


a sum of squares in K. 


The Degree of a Field Extension 


. Let F be a field, and let a be an element which generates a field extension of F of degree 


5. Prove that a? generates the same extension. 


. Let £ = e?7/7, and let y = e?7"/5. Prove that n € Q(é). 
. Define g, = e?”/". Find the irreducible polynomial over Q of (a) fs, (b) &, (c) &, 


(d) go, (€) fio, (f) fie. 


. Let fn = e°™/". Determine the irreducible polynomial over Q(¢3) of (a) &, (b) L, 


(C) dra. 


. Prove that an extension K of F of degree 1 is equal to F. 
. Let a € Q be an element which is not a square in Q. Prove that Va has degree 4 over 


Q 


. Decide whether or not i is in the field (a) Q(V-2), (b) Q(W/—2), (c) Q(a), where 


wotati1=0. 


. Let K be a field generated over F by two elements a, B of relatively prime degrees m,n 


respectively. Prove that [K:F] = mn. 


. Let a, B be complex numbers of degree 3 over Q, and let K = Q(a, 8). Determine the 


possibilities for [K: Q]. 


. Let a, B be complex numbers. Prove that if a + B and af are algebraic numbers, then 


a and B are also algebraic. 


. Let a, B be complex roots of irreducible polynomials f(x), g(x) € Q[x]. Let F = Q[{a] 


and K = Q[]. Prove that f(x) is irreducible in K if and only if g(x) is irreducible in F. 


. (a) Let F C F' C K be field extensions. Prove that if [K:F] = [K:F'], then F = F’. 


(b) Give an example showing that this need not be the case if F is not contained in F'’. 


. Let a,...,a% be elements of an extension field K of F, and assume that they are all alge- 


braic over F. Prove that F(a,,...,a%) = F[a,..., ax]. 


. Prove or disprove: Let a, B be elements which are algebraic over a field F, of degrees 


d,e respectively. The monomials a‘ B/ with i = 0,...,d — 1, j = 0,...,e — 1 form a 
basis of F(a, B) over F. 
Prove or disprove: Every algebraic extension is a finite extension. 


4. Constructions with Ruler and Compass 


In 
2. 


Express cos 15° in terms of square roots. 
Prove that the regular pentagon can be constructed by ruler and compass (a) by field 
theory, and (b) by finding an explicit construction. 
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3. Derive formula (4.12). 

4. Determine whether or not the regular 9-gon is constructible by ruler and compass. 

5. Is it possible to construct a square whose area is equal to that of a given triangle? 

6. Let a be a real root of the polynomial x* + 3x + 1. Prove that @ can not be constructed 
by ruler and compass. 

7. Given that 7 is a transcendental number, prove the impossibility of squaring the circle 
by ruler and compass. (This means constructing a square whose area is the same as the 
area of a circle of unit radius.) 


8. Prove the impossibility of “duplicating the cube,” that is, of constructing the side length 

ofa cube whose volume is 2. 

9. (a) Referring to the proof of Proposition (4.8), prove that the discriminant D is negative 
if and only if the circles do not intersect. 

(b) Determine the line which appears at the end of the proof of Proposition (4.8) geo- 
metrically if D = 0 and also if D < 0. 

10. Prove that if a prime integer p has the form 2’ + 1, then it actually has the form 
ae. 

11. Let C denote the field of constructible real numbers. Prove that C is the smallest subfield 
of R with the property that if a € C anda > 0, then Va EC. 

12. The points in the plane can be considered as complex numbers. Describe the set of con- 

structible points explicitly as a subset of C. 

13. Characterize the constructible real numbers in the case that three points are given in the 
plane to start with. 
*14. Let the rule for construction in three-dimensional space be as follows: 

(i) Three non-collinear points are given. They are considered to be constructed. 

(ii) One may construct a plane through three non-collinear constructed points. 

(iii) One may construct a sphere with center at a constructed point and passing through 
another constructed point. 

(iv) Points of intersection of constructed planes and spheres are considered to be con- 
structed if they are isolated points, that is, if they are not part of an intersection 
curve. 

Prove that one can introduce coordinates, and characterize the coordinates of the con- 

structible points. 


5. Symbolic Adjunction of Roots 


1. Let F be a field of characteristic zero, let f'’ denote the derivative of a polynomial 
f © F[x], and let g be an irreducible polynomial which is a common divisor of f and f '. 
Prove that g? divides f. 

2. For which fields F and which primes p does x? — x have a multiple root? 

3. Let F be a field of characteristic p. 

(a) Apply (5.7) to the polynomial x? + 1. 
(b) Factor this polynomial into irreducible factors in F [x]. 

4. Let a,..., an be the roots of a polynomial f € F[x] of degree n in an extension field K. 
Find the best upper bound that you can for [F(a@1,...,@n) : F]. 
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6. Finite Fields 


1s 


16. 


. Identify the group F,*. 

- Write out the addition and multiplication tables for F; and for Z/(4), and compare them. 
. Find a thirteenth root of 3 in the field F,,. 

. Determine the irreducible polynomial over 2 for each of the elements (6.12) of Fe. 

. Determine the number of irreducible polynomials of degree 3 over the field Fs. 

- (a) Verify that (6.9, 6.10, 6.13) are irreducible factorizations over F>. 


(b) Verify that (6.11, 6.13) are irreducible factorizations over Z. 
Factor x? — x and x*’ — x in F;. Prove that your factorizations are irreducible. 


. Factor the polynomial x'® — x in the fields (a) F4 and (b) Fg. 

. Determine all polynomials f(x) in F,[x] such that f(a) = 0 for alla € Fo. 

. Let K be a finite field. Prove that the product of the nonzero elements of K is —1. 

. Prove that every element of F, has exactly one pth root. 

. Complete the proof of Proposition (6.19) by showing that the difference a — B of two 


roots of x? — x is a root of the same polynomial. 


. Let p be a prime. Describe the integers n such that there exist a finite field K of order n 


and an element a € K™ whose order in K™ is p. 


. Work this problem without appealing to Theorem (6.4). 


(a) Let F = F,. Determine the number of monic irreducible polynomials of degree 2 in 
F{[x]. 

(b) Let f(x) be one of the polynomials described in (a). Prove that K = F[x]/(f) is a 
field containing p* elements and that the elements of K have the form a + ba, where 
a,b © F and ais a root of fin K. Show that every such element a + ba with b # 0 
is the root of an irreducible quadratic polynomial in F [x]. 

(c) Show that every polynomial of degree 2 in F[x] has a root in K. 

(d) Show that all the fields K constructed as above for a given prime p are isomorphic. 

The polynomials f(x) = x? + x + 1, g(x) = x? + x? + 1 are irreducible over F2. Let 

K be the field extension obtained by adjoining a root of f, and let L be the extension ob- 

tained by adjoining a root of g. Describe explicitly an isomorphism from K to L. 

(a) Prove Lemma (6.21) for the case F = C by looking at the roots of the two poly- 
nomials. 

(b) Use the principle of permanence of identities to derive the conclusion when F is an 
arbitrary ring. 


7, Function Fields 


ie 


Ze 
A} 


Determine a real polynomial in three variables whose locus of zeros is the projected 
Riemann surface (7.9). 

Prove that the set #(U) of continuous functions on U’ forms a ring. 

Let f(x) be a polynomial in F [x], where F is a field. Prove that if there is a rational func- 


tion r(x) such that r? = f, then r is a polynomial. 


4. Referring to the proof of Proposition (7.11), explain why the map F——> %(S) defined 


by g(x)~~~ g(x) is a homomorphism. 
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5; 


oat i 
*8. 


59. 


8. 


5. 


Determine the branch points and the gluing data for the Riemann surfaces of the follow- 
ing polynomials. 

(a) y2-x? +1 (b) vi-x (ce) yt*-—x-1 (dd) y?—2xy- x 

(eh yea ee (Ff) vy? = xe =) (eg) y = 2G — Th ye 

(i) ey = ek 


. (a) Determine the number of isomorphism classes of function fields K of degree 3 over 


F = C(x) which are ramified only at the points +1. 

(b) Describe the gluing data for the Riemann surface corresponding to each isomorphism 
class of fields as a pair of permutations. 

(c) For each isomorphism class, determine a polynomial f(x, y) such that K = F[x]/(f) 
represents the isomorphism class. 


Prove the Riemann Existence Theorem for quadratic extensions. 
Let S be a branched covering constructed with branch points a,...,a@,, curves 
C,,...,C,, and permutations o),...,0,. Prove that S is connected if and only if the sub- 


group 2 of the symmetric group S, which is generated by the permutations o,, operates 
transitively on the indices 1,..., n. 

It can be shown that the Riemann surface S$ of a function field is homeomorphic to the 
complement of a finite set of points in a compact oriented two-dimensional manifold S. 
The genus of such a surface is defined to be the number of holes in the corresponding 
manifold S. So if S is a sphere, the genus of S is 0, while if S is a torus, the genus of S is 
1. The genus of a function field is defined to be the genus of its Riemann surface. Deter- 
mine the genus of the field defined by each polynomial. 

Qy —( a 4) (be a 4 ee 

(Oy = Oe) (eye = Gye 


Transcendental Extensions 


. Let K = F(a) be a field extension generated by an element a, and let 8 € K, B & F. 


Prove that @ is algebraic over the field F(G). 


. Prove that the isomorphism @(a)——> Q(e) sending 7~~~ e is discontinuous. 
- LetF C K CL be fields. Prove that tr degeL = tr degre K + tr degxL. 
. Let (a@,...,@n) C K be an algebraically independent set over F. Prove that an element 


B € K is transcendental over F(a),...,@n) if and only if (a1,...,@n;8) is algebraically 
independent. 


Prove Theorem (8.3). 


9. Algebraically Closed Fields 


1. 
Z. 


23. 


Derive Corollary (9.5) from Theorem (9.4). 


Prove that the field F constructed in this text as the union of finite fields .is algebraically 
closed. 


With notation as at the end of the section, a comparison of the images f(C,) for varying 
radii shows another interesting geometric feature: For large r, the curve f(C,) has n 
loops. This can be expressed formally by saying that its total curvature is 27rn. For small 
r, the linear term a,z + do dominates f(z). Then f(C,) makes a single loop around ap. Its 
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total curvature is only 277. Something happens to the loops and the curvature, as r varies. 
Explain. 


*4. If you have access to a computer with a good graphics system, use it to illustrate the vari- 


ation of f(C,) with r. Use log-polar coordinates (log r, @). 


Miscellaneous Exercises 


*4, 


. Let f(x) be an irreducible polynomial of degree 6 over a field F’, and let K be a quadratic 


extension of F. Prove or disprove: Either f is irreducible over K, or else f is a product of 
two irreducible cubic polynomials over K. 


. (a) Let p be an odd prime. Prove that exactly half of the elements of F,* are squares 


and that if a, B are nonsquares, then a is a square. 
(b) Prove the same as (a) for any finite field of odd order. 
(c) Prove that in a finite field of even order, every element is a square. 


. Write down the irreducible polynomial for a = V2 + V3 over @ and prove that it is 


reducible modulo p for every prime p. 


(a) Prove that any element of GL2(Z) of finite order has order 1,2,3,4, or 6. 
(b) Extend this theorem to GL3(Z), and show that it fails in‘GL,(Z). 


. Let c be a real number, not +2. The plane curve C: x* + cxy + y? = 1 can be 


parametrized rationally. To do this, we choose the point (0, 1) on C and parametrize the 

lines through this point by their slope: L;: y = tx + 1. The point at which the line L, in- 

tersects C can be found algebraically. 

(a) Find the equation of this point explicitly. 

(b) Use this procedure to find all solutions of the equation x? + cxy + y* = 1 in the 
field F = Fp, when c is in that field and c # +2. 

(c) Show that the number of solutions is p — 1, p, or p + 1, and describe how this 
number depends on the roots of the polynomial 1? + ct + 1. 


. The degree of a rational function f(x) = p(x)/q({x) € C(x) is defined to be the maxi- 


"75 


*8 


#95 


mum of the degrees of p and q, when p, q are chosen to be relatively prime. Every ratio- 

nal function f defines a map P’——> P'’, by x» f(x). We will denote this map by f 

too. 

(a) Suppose that f has degree d. Show that for any point yo in the plane, the fibre f~ '(yo) 
contains at most d points. 

(b) Show that f~'(yo) consists of precisely d points, except for a finite number of yo. 
Identify the values yo where there are fewer than d points in terms of f and df /dx. 

(a) Prove that a rational function f(x) generates the field of rational functions C(x) if and 
only if it is of the form (ax + b)/(cx + d), with ad — bc # 0. 

(b) Identify the group of automorphisms of C(x) which are the identity on C. 

Let K/F be an extension of degree 2 of rational function fields, say K = C(t) and 

F = C(x). Prove that there are generators x',t’ for the two fields, such that 

t = (at’ + B)/(yt' + 8) and x = (ax’ + b)/(cx' + d), a,B,y,6,a,b,c,d € C, 

such that t'? = x’. 

Fill in the following outline to give an algebraic proof of the fact that K = 

C(x)[y]/(y? — x3 + x) is not a pure transcendental extension of C. Suppose that K = 

C(t) for some t. Then x and y are rational functions of ¢. 
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Using the result of the previous problem and replacing 7 by ft’ as necessary, reduce to 
the case that x = (at? + b)/(ct? + d). 
Say that y = p(t)/q(t). Then the equation y* = x(x + 1)(x — 1) reads 

pe. (at ola ee ab + aia it Pd) 

q(t)? (ch eg) 


Either the numerators and denominators on the two sides agree, or else there is can- 
cellation on the right side. 

Complete the proof by analyzing the two possibilities given in (b). 

Prove that the homomorphism SL2(Z)——> SL2(Fp) obtained by reducing the matrix 
entries modulo 2 is surjective. 

Prove the analogous assertion for SLy. 


*11. Determine the conjugacy classes of elements order 2 in GL,(Z). 


Chapter 14 


Galois Theory 


En un mot les calculs sont impraticables. 


Evariste Galois 


Il. THE MAIN THEOREM OF GALOIS THEORY 


In the last chapter we studied algebraic field extensions, using extensions generated 
by a single element as the basic tool. This amounts to studying the properties of a 
single root of an irreducible polynomial 


(1.1) Co ae ne PO ic ge a's 


Galois theory, the topic of this chapter, is the theory.of all the roots of such a poly- 
nomial and of the symmetries among them. 

We will restrict our attention to fields of characteristic zero in this chapter. It 
is to be understood that all fields occurring have characteristic zero, and we will not 
mention this assumption explicitly from now on. 

The notation K/F will indicate that K is an extension field of F. This notation 
is traditional, though there is some danger of confusion with the notation R// for the 
quotient of a ring R by an ideal /. 

As we have seen, computation in a field F(a@) generated by a single root can 
easily be made by identifying it with the formally constructed field F[x]/(f). But 
suppose that an irreducible polynomial f(x) factors into linear factors in a field exten- 
sion K, and that its roots in K are @,...,@n. How to compute with all these roots at 
the same time isn’t clear. To do so we have to know how the roots are related, and 
this depends on the particular case. In principle, the relations can be obtained by ex- 
panding the equation f(x) = (x — a)(x — ay)+++(x — an). Doing so, we find that 
the sum of the roots is —a,—;, that their product is ao, and so on. However, it may 
not be easy to interpret these relations directly. 
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The fundamental discovery which arose through the work of several people, 
especially of Lagrange and Galois, is that the relationships between the roots can be 
understood in terms of symmetry. The original model for this symmetry is complex 
conjugation, which permutes the two roots +i of the irreducible real polynomial 
x? + 1, while leaving the real numbers fixed. We will begin by observing that such a 
symmetry exists for any, quadratic field extension. 

An extension K/F of degree 2 is generated by any element a of K which is not 
in F. Moreover, a is a root of an irreducible quadratic polynomial 


(1.2) flap] < 4 beeeec 


with coefficients in F. Then a’ = —b — a is also a root of f, so this polynomial 
splits into linear factors over K: f(x) = (x — a)(x — a’). 

The fact that a and a’ are roots of the same irreducible polynomial provides 
us with our symmetry. According to Proposition (2.9) of Chapter 13, there is an iso- 
morphism 


(1.3) a: F(a) —— F(a’), 


which is the identity on F and which sends a~~~ a’. But either root generates the 
extension: F(a) = K = F(a’). Therefore o is an automorphism of K. 

This automorphism switches the two roots a,a’. For, since a is the identity 
on F, it fixes b, and a + a’ = b. So if (a) = a’, we must have g(a’) = a. It 
follows that 0? sends a~~~» a and, since a generates K over F, that a” is the iden- 
tity. 

Note also that o is not the identity automorphism, because the two roots a, a’ 
are distinct. If a were a double root of the quadratic polynomial (1.2), the quadratic 
formula would give a = — 3b. This would imply a: € F, contrary to our hypothesis 
that f is irreducible. 

Since our field F is assumed to have characteristic zero, the quadratic extension 
K can be obtained by adjoining a square root 6 of the discriminant D = b? — 4c, a 
root of the irreducible polynomial x* — D. Its other root is —5, and o interchanges 
the two square roots. 

Whenever K is obtained by adjoining a square root 6, there is an automor- 
phism which sends 5» —68. For example, let a = 1 + V2, and let K = Q(a). 
The irreducible polynomial for a over Q is x” — 2x — 1, and the other root of this 
polynomial is a’ = 1 — V2. There is an automorphism o of K which sends 
V2 -V2 and awa’. It is important to note right away that such an auto- 
morphism will not be continuous when K is considered as a subfield of R. It is a 
symmetry of the algebraic structure of K, but it does not respect the geometry given 
by the embedding of K into the real line. 

By definition, an F-automorphism of an extension field K is an automorphism 
which is the identity on the subfield F [see Chapter 13 (2.10)]. In other words, an 
automorphism o of K is an F-automorphism if o(c) = c for all c © F. Thus com- 
plex conjugation is an R-automorphism of C, and the symmetry o we have just 
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found is an F-automorphism of the quadratic extension K. It is not difficult to show 
that o is the only F-automorphism of this extension other than the identity. 

The group of all F-automorphisms of K is called the Galois group of the field 
extension. We often denote this group by G(K/F). When K/F is a quadratic exten- 
sion, the Galois group G(K/F) is a group of order 2. 

Let us now consider the next simplest example, that of a biquadratic extension. 
We will call a field extension K/F biquadratic if [K:F] = 4 and if K is generated by 
the roots of tivo irreducible quadratic polynomials. Every such extension has the 
form 


(1.4) K = F(a, 8), 


= 


where a* = a and B* = b, and where a, b are elements of F. The element £8 gener- 
ates an intermediate field—a field F(8) between F and K. Since K = F(a,B), the 
requirement that [K:F] = 4 implies that F(8) has degree 2 over F and that @ is not 
in the field F(8). So the polynomial x* — a is irreducible over F(B). Similarly, the 
polynomial x* — b is irreducible over the intermediate field F(a). 

Notice that K is an extension of F(B) of degree 2, generated by a. Let us apply 
what we have just learned about quadratic extensions to this extension. Substituting 
F(B) for F, we find that there is an F(8)-automorphism of K which interchanges the 
two roots +a of x? — a. Call this automorphism a. Since it is the identity on F(B), 
o is also the identity on F, so it is an F-automorphism too. Similarly, there is an 
F(a)-automorphism 7 of K which interchanges the roots +B of x? — b, and 7 is also 
an F-automorphism. 

The two automorphisms we have found operate on the roots a, B as follows: 
aw —a ams a 
goad es lye 
Composing these operations, we find that or changes the signs of both roots a, B 
and that the automorphisms o’, 7°, and oro7 leave a and B fixed. Since K is gener- 
ated over F by the roots, these last three automorphisms are alli equal to the identity. 
Therefore the four automorphisms {1,o0,7,07} form a group of order 4, with rela- 
tions 


(135) 


2 


of = 1, Tr 1|, PHT) 


We have shown that the Galois group G(K/F) contains the Klein four group. In fact 
it is equal to that group, as we shall see in a moment. 

For example, let F = ©, a = i, and B = V2, so that K = Q(i, V2). In this 
case, the automorphism o is complex conjugation, while 7 sends VI mw -V2, 
fixing 7. 

For quadratic or biquadratic extensions, the degree [K : F] is equal to the order 
of the Galois group G(K/F). We will now state two theorems, Theorems (1.6) and 
(1.11), which describe the general circumstances under which this happens. These 
theorems will be proved in later sections of the chapter. 
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(1.6) Theorem. For any finite extension K/F, the order |G(K/F)| of the Galois 
group divides the degree [K : F] of the extension. 


A finite field extension K/F is called a Galois extension if the order of the Ga- 
lois group is equal to the degree: 


(1.7) (G(K/P ie 


Theorem (1.6) shows that the Galois group of a biquadratic extension has order at 
most 4. Since we already have four automorphisms in hand, there are no others, and 
the Galois group is the Klein four group, as was asserted. All quadratic and bi- 
quadratic extensions are Galois., 

If G is a group of automorphisms of a field K, the set of elements of K which 
are fixed by all the automorphisms in G forms a subfield, called the fixed field of G. 
The fixed field is often denoted by K°: 


(1.8) K% = {a € K|¢(a) = @ forall g € G}. 


One consequence of Theorem (1.6) is that when K/F is a Galois extension, the only 
elements of K which are fixed by the whole Galois group are the elements of F: 


(1.9) Corollary. Let K/F be a Galois extension, with Galois group G = G(K/F). 
The fixed field of G is F. 


For let L denote the fixed field. Then F C L, and this inclusion shows that every L- 
automorphism of K is also an F-automorphism, that is, that G(K/L) C G. On the 
other hand, by definition of the fixed field, every element of G is an L-automor- 
phism. So G(K/L) = G. Now |G| = [K: F] because K/F is a Galois extension, 
and by Theorem (1.6), |G| divides [K : L]. Since F C L C K, this shows that 
[K: F] =[K: L], hence that F = L.o 


This corollary is important because it provides a method for checking that an ele- 
ment of a Galois extension K is actually in the field F. We will use it frequently. 

Being Galois is a strong restriction on a field extension, but nevertheless there 
are many Galois extensions. This is the key fact which led to Galois’ theory. In or- 
der to state the theorem which describes the Galois extensions, we need one more 
definition. 


(1.10) Definition. Let f(x) © F[x] be a nonconstant monic polynomial. A split- 
ting field for f(x) over F is an extension field K of F such that 


(i) f(x) factors into linear factors in K: f(x) = (x—-a@)--- (x—a,), with a; € K: 
(ii) K is generated by the roots of f(x): K = F(ay,..., Qn). 


The second condition just says that K is the smallest extension of F which contains 
all the roots. The biquadratic extension (1.4) is a splitting field of the polynomial 


SU (eet =): 
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Every polynomial f(x) © F [x] has a splitting field. To find one, we choose a 
field extension L in which f splits into linear factors [Chapter 13 (5.3)] and then take 
for K the subfield F(a,,...,a@n) of L generated by the roots. 


(1.11) Theorem. If K is a splitting field “f a polynomial f(x) over F, then K is a 
Galois extension of F. Conversely, every Galois extension is a splitting field of some 
polynomial f(x) € F[x]: 


(1.12) Corollary. Every finite extension is contained in a Galois extension. 


To derive this corollary from the theorem, let K/F be a finite extension, let 
@,...,@n be generators for K over F, and let fi(x) be the monic irreducible polyno- 
mial for a; over F. We extend K to a splitting field L of the product f = fi--: fy over 
K. Then L will also be a splitting field of f over F. So L is the required Galois exten- 
sion. o 


(1.13) Corollary. Let K/F be a Galois extension, and let L be an intermediate 
field: F C L C K. Then K/L is a Galois extension too. 


For, if K is the splitting field of a polynomial f(x) over F, then it is also the splitting 
field of the same polynomial over the larger field L, so K is a Galois extension of 
Lea 


Let us go back to biquadratic extensions. We can prove that the Galois group of 
such an extension has order 4 without appealing to Theorem (1.6). All that is needed 
is the following elementary proposition: 


(1.14) Proposition. 


(a) Let K be an extension of a field F, let f(x) be a polynomial with coefficients in 
F, and let o be an F-automorphism of K. If @ is a root of f(x) in K, then a (a) 
is also a root. 

(b) Let K be a field extension generated over F by elements a,,...,a,, and let o be 
an F-automorphism of K. If o fixes each of the generators a;, then o is the 
identity automorphism. 

(c) Let K be a splitting field of a polynomial f(x) over F. The Galois group 
G(K/F) operates faithfully on the set {a1,..., an}. 


Proof. Part (a) was proved in the last chapter [Chapter 13 (2.10)]. To prove 
part (b), assume that K is generated by a,...,@n. Then every element of K can be 
expressed as a polynomial in @),..., @n with coefficients in F [Chapter 13 (2.6b)]. If 
go is an automorphism which is the identity on F and which also fixes each of the el- 
ements aj, then it fixes every polynomial in {a;} with coefficients in F; hence it is the 
identity. The third assertion (c) follows from the first two: The first tells us that ev- 
ery ao © G(K/F) permutes the set {a1,...,@n}, and the second tells us that the oper- 
ation on this set is faithful. o 
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Proposition (1.14) does not address the most interesting question: Which per- 
mutations of the roots of a polynomial extend to automorphisms of the splitting field? 
This question is the central theme of Galois theory. 

Let us apply Proposition (1.14) to the biquadratic extension (1.4). Part (a), ap- 
plied to the polynomial x* — a, shows that any F-automorphism ¢ of K permutes 
the roots +a. Similarly, g@ permutes +B. Only four permutations of {+a, +B} act 
in this way. Since the elements a, 8 generate K, (1.14b) tells us that an F-automor- 
phism which fixes both of them is the identity. So the four automorphisms which we 
have already found are the only ones. This proves that G(K/F) is the Klein four 
group. 

One of the most important parts of Galois theory is the determination of the 
intermediate fields L, those sandwiched between F and K: F C L C K. The Main 
Theorem of Galois theory asserts that when K/F is a Galois extension, the interme- 
diate fields are in bijective correspondence with the subgroups of the Galois group. 
The importance of this correspondence is not immediately clear. We will have to see 
it used to understand it. 

The intermediate field corresponding to a subgroup H of G(K/F) is the fixed 
field K” of H, which was defined above. In the other direction, if L is an intermedi- 
ate field, the Galois group G(K/L) is a subgroup of G(K/F). This is the subgroup 
which corresponds to L. 


(1.15) Theorem. The Main Theorem: Let K be a Galois extension of a field F, and 
let G = G(K/F) be its Galois group. The function 
Hwww~ K H 


is a bijective map from the set of subgroups of G to the set of intermediate fields 
F CL C K. Its inverse function is 


Law G(K/L). 
This correspondence has the property that if H = G(K/L), then 
(1.16) [K:L]=)8|, henee [L:>F])=§G : Hh 


We will prove this theorem in Section 5. 


The fields F and K are included among the intermediate fields. The subgroup 
which corresponds to the field F is the whole group G [see (1.9)], and the one corre- 
sponding to K is the trivial subgroup {1}. 

Let us go back to our example of the biquadratic extension K = Q(i, WD) for 
which o@ is complex conjugation, while 7 interchanges V2~~+ -V2. Its Galois 
group, the Klein four group, has three proper subgroups: 


H, = {l,o}, Ho = {1,7}, Hs = {1,o7}. 


According to the Main Theorem, there are three proper intermediate fields, namely 
the fixed fields L; of these subgroups. They are easily determined: 


L, = Q(v2), I, = Qi), and L; = Q(iV2). 
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A Galois group is finite, so it has finitely many subgroups. But without the 
Main Theorem, it isn’t obvious that there are only finitely many intermediate fields. 
It might seem natural to expect two randomly chosen elements of a Galois extension 
K/F to generate different subfields. This tends not to happen, and in fact most ele- 
ments will generate the whole extension K. The case of the biquadratic extension 
K = Q(i, V2) will illustrate this point. Let y be any element of K. The field OQ(y) 
generated by y must be one of the intermediate fields we have found. So if y is not 
contained in Q(ij), Q(V2), or Q(iV2), then Q(y) = K. Now the set 
(hi, V2,i2) is a basis for K over F, so we may write an arbitrary element y in 
the form 


y=e1t+ cit c3V2 + ciV2, with co € Q. 


This element is not in one of the three proper intermediate fields unless two of the 
coefficients c2,c3,cs are zero. The element i + V2, for example, generates the 
whole extension K. We will return to this point in Section 4. 


2. CUBIC EQUATIONS 


Having examined biquadratic extensions in the last section, we now turn to the next 
general class of examples, the splitting fields of cubic polynomials. Cubic equations 


1) f(x) = x2 + anx? + ax + a =0 


were solved explicitly in terms of square roots and cube roots in the sixteenth cen- 
tury by the mathematicians Tartaglia and Cardano. We will begin by reviewing their 
remarkable ad hoc solution. 

The computation is simpler when the coefficient of degree 2 in f(x) vanishes. 
The quadratic term in our general equation (2.1) can be eliminated by the substitu- 


tion 
(2:2) x =x — @/3. 

Let us write a cubic whose quadratic term vanishes as 
(2.3) ia) x + peta, 


where the coefficients p,q are elements of the field F. Cardano’s solution of the 
equation f = 0 starts with the substitution x = u — v. Collecting terms in 
f(u — v), we find 


f(u — v) = (v—v’) — Buv—p)(u-v) + g. 


The point of replacing the variable x by a sum of variables is that we can now split 
our equation apart. Clearly, f(u — v) = 0 if the two equations 


3uv —- p= 0, w—-—v?+q=0 


hold. And since we have two variables, we may hope to obtain solutions to such a 
pair of equations, though it isn’t clear a priori that this will help. We solve the first 
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equation for v = p/3u and substitute into the second. Clearing the denominator 
gives 

33u® — p>? + 3u%q = 0. 


Miraculously, this equation is quadratic in u°. Setting y = u’, it reduces to 


(2.4) Sey ete Ss gy. p80: 
This equation can be solved by the quadratic formula: 
2 3 
q q Pp 
= — — ~~ pe + {— ; 
ey PIS (4) 6 
Thus we obtain Cardano’s Formula x = u — v, where 
(2.6) 
3/ q ON Ce ue (2) +(2) 
= \/-= 4) 4 5 =Vut+q= *+y/(2}) +(=] . 
: aa (4) +(B)’ ial a. 5 NG 


We will be able to prove the existence of a solution of this general type later, with- 
out explicit computation [see (7.6)]. 

Let us now examine the Galois theory of an irreducible cubic polynomial f(x). 
We may assume that f(x) has the form (2.3). Let K be a splitting field of f(x) over F, 
and let a@,, a, @3 be the three roots of f(x) in K, ordered in an arbitrary way, so that 


(2.7) f(x) = x°+ pxtq = (x-a@1)(x—-a@2)(x—a33). 
Expanding the right side of this equation, we obtain the relations 
a, +a,+a;=0 
(2.8) a1Q2 + aa; + aia; = p 
. 10203; = —q. 


The first of these relations shows that the third root a; is in the field generated by the 
first two roots. Thus we have a chain of fields 


F GF@;) GK, 


and K = F(a,,a2) = F(a,a2, a3). Let us denote F(a,) by L. There are two funda- 
mentally different cases which may arise, namely either 


(2.9) = Kk orm L < kK, 


In terms of the roots, the first case occurs when the last two roots a2 and a3 can be 
expressed in terms of a, and elements of F, that is, if they can be written as polyno- 
mials in a with coefficients in F [see Chapter 13 (2.6)]. The second case occurs 
when the last two roots can not be expressed in this way. 

For example, let f(x) = x’ — 2. The three roots of this polynomial are 
a, = V2, a2. = fas oy a=¢ 21/2, where V2 denotes the real cube root of 2 and 
£ = e?™/3 Since a, is real, the field Q(a) is contained in R. It doesn’t contain the 
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other two roots, which are complex. Hence if F = @ and L = Q(a,), we are in the 
second case. On the other hand, if we let F = Q(¢), then F(a) contains a2, so we 
are in the first case. 

To analyze the dichotomy (2.9), we consider the way the irreducible polyno- 
mial f(x) factors in the field L. By assumption, f(x) is irreducible in F [x], and it fac- 
tors into linear factors in K [x]. In the ring L[x], f(x) has the factor (x — a): 


(2.10) f= (ie an hG), 


where h(x) is a quadratic polynomial with coefficients in L. Division by x — ay, 
gives the same result if it is carried out in the larger field K. Looking at (2.7), we 
see that h(x) = (x — a2)(x — a3) in K[x]. Therefore L < K if and only if h(x) is 
irreducible over L. In this case, the degree of L(a2) = K over L is 2. Also, since we 
assume f(x) irreducible over F, [L : F] = 3 in either case. So we have 


3ifL=K 
QA [K : F] eee 
(2.12) Example. The polynomial f(x) = x* + 3x + 1 is irreducible over Q, and it 
has only one real root. To see that there is only one real root, we note that the 
derivative of f does not vanish on the real line. Therefore f(x) defines an increasing 
function of the real variable x. It takes the value 0 only once. The real root does not 
generate the splitting field K, which also contains two complex roots. So 
[K : Q] = 6 in this case. 

On the other hand, the splitting field of the polynomial f(x) = x* — 3x + 1 
over @ has degree 3. One of its roots is m = 2 cos 27/9 = { + f°, where 
¢ = e?7/9 Having the polynomial in hand, we can check this directly. But actually, 
we made this example by computing the irreducible polynomial for y, over @. The 
way to compute this polynomial is to guess its other roots. We note that 7, is the 
sum of a ninth root of | and its inverse. There are two other sums of this sort: 
m = 0? + Cf’ and n; = £* + L£°. We guess that these are the other roots and expand 
(x — m)(x — m)(x — ys), obtaining f. In this example, m2 happens to be equal to 
m? — 2, and y = -—m — m. SoK = F(m). 0 


We go back to a general cubic equation. According to Theorem (1.11), the or- 
der of the Galois group G = G(K/F) is the degree of the field extension [K : F]. 
For cubic equations, this degree determines the group G completely. Namely, Propo- 
sition (1.14) tells us that G operates faithfully on the set {a , a2, a3} of roots. These 
roots are distinct [Chapter 13 (5.8)]. So G is a subgroup of the symmetric group S;, 
which has order 6. If [K : F] = 6, then G is the whole symmetric group. In this 
case any permutation of the roots is realized by an F-automorphism of K. On the 
other hand, the only subgroup of 53 of order 3 is the alternating group A3, a cyclic 
group. So if [K : F] = 3, then G = A3. In this case the cyclic permutations and the 
identity are the only ones which extend to F-automorphisms. Thus the roots of an ir- 
reducible cubic polynomial may have either dihedral or cyclic symmetry. But these 
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symmetries are algebraic; they will not be symmetries of K when this field is viewed 
as a set of points in the complex plane. 

Let us determine the intermediate fields in the case that the degree [K : F] is 
6. (There are no intermediate fields properly between F and K when [K : F] = 3.) 
The symmetric group S; has three conjugate subgroups of order 2 and one subgroup, 
A;, of order 3. There are three obvious intermediate fields: F(a1), F(a@2), F(a). 
They are isomorphic but not equal subfields of K, and they correspond to the three 
subgroups of order 2. But the intermediate field which corresponds to the subgroup 
A3 is not obvious. Let us denote this mystery field by L. According to the Main The- 
orem, G(K/L) = A3. Hence [K : L] = 3 and [L: F] = 2. So L is a quadratic ex- 
tension of F, which can be obtained by adjoining a square root. The Main Theorem 
has told us an interesting fact: K contains the square root 6 of an element of F. And 
since there is only one intermediate extersion of degree 2, this square root is essen- 
tially unique. The Main Theorem also tells us that L is the fixed field of the subgroup 
A3. So an even permutation of the roots leaves 6 fixed, while an odd permutation 
does not. The required element is 


(2.13) 5 = (a1 — a»)(a1 — a3)(a2 — a). 


A permutation of the roots multiplies 6 by the sign of the permutation. Hence 6 is 
not fixed by all elements of G(K/F) = S;, so 6 € F. But 5? is fixed by every per- 
mutation. Corollary (1.9) tells us that 67 € F. 

For any. cubic polynomial f(x) = (x — ai)(x — a2)(x — as), the element 


(2.14) D = (a — a2)*(a1 — a3)*(a2 — a3)? 


is called the discriminant of the polynomial. It is an element of the field F which is 
zero if and only if two roots of f(x) are equal. So it is analogous to the discriminant 
of the quadratic polynomial x? + bx + c = (x — a)(x — a), which is b? — 4c = 
(a, — a)’. If the cubic f is irreducible, then its roots are distinct, hence D ¥ 0. 

The fact that the discriminant of the cubic polynomial is an element of F fol- 
lows from Corollary (1.9), but it is not trivial. We will prove it abstractly in the next 
section, but it can also be checked by direct calculation. Using formulas (2.8), we 
can compute the discriminant in terms of the coefficients p,q. It is 


(2543) D = —4p? — 274q’. 


(2.16) Proposition. The discriminant of an irreducible cubic polynomial 
f(x) © Fx] is a square in F if and only if the degree of the splitting field is 3. 


If we choose a polynomial with integer coefficients at random, the chances are 
good that its discriminant will not be a square in Q@. For example, the discriminant 
of x* + 3x + 1 is —135. On the other hand, the discriminant of x3 — 3x + 1 is 81, 
a square. This agrees with the fact that [K : F] = 3 [see (2.12)]. 


Proof of the Proposition. lf D is not a square, then 6 € F, and therefore 
[F(5) : F] = 2. Since 8 € K,[K : F] is divisible by 2, hence by (2:11), [K : F] = 
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6. On the other hand, if 56 € F, then every element of the Galois group 
G = G(K/F) fixes 5. Since odd permutations change the sign of 5, they are not in 
G, and hence G # $3. Therefore [K : F] = 3.0 


How could such a proposition be true? There must be a formula which ex- 
presses the second root @2 in terms of the elements a;,6, and the coefficients p, q. 
This formula exists, and it is instructive to compute it explicitly. 


3. SYMMETRIC FUNCTIONS 


Galois theory is concerned with the problem of determining those permutations of 
the roots of a polynomial which extend to field automorphisms. In this section we ex- 
amine a simple situation in which every permutation extends, namely when the roots 
are independent variables. 

Let R be any ring, and consider the polynomial ring R[w,..., un] in n variables 
u;. A permutation o of {1,...,m} can be made to operate on polynomials, by permut- 
ing the variables. We must decide here how we want permutations to operate. Let us 
keep automorphisms on the left. Then o operates by the inverse permutation on the 
indices: 


(3.1) f = f(Q@j,..:, tn) mows f (Utg-15-++5 Uno~!) = of. 


This is clearly an automorphism of R[u]. Since it acts as the identity on R, o is 
called an R-automorphism. So the symmetric group S, operates by R-automorphisms 
on the polynomial ring R[u]. A polynomial is called symmetric if it is left fixed by 
all permutations. 

It is easy to describe the symmetric polynomials. In order for g to be symmet- 
ric, two monomials in {u),..., un} which differ by a permutation of the indices, such 
aS u\7U2 and u*u3, must have the same coefficients in g. A symmetric polynomial 
which involves a given monomial must include the whole orbit. Thus 


g(u) = (urtur tus?) + 5(uy?u2+ uy? u3+u2?u3+ U2? Uy tus ats") — UyU2Us 


is a symmetric polynomial of degree 3 in three variables. 
There are n special symmetric polynomials with integer coefficients, called the 


elementary symmetric functions sj: 


(3.2) 5) = & + 2 Foe + Un 
So = U2 + Wyu3 + o°ee + Up—-1Un = » Ujuj 
i<j 
3 = >) uuu 
i<j<k 


Sn = U,U2°°* Un. 
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They are the coefficients of the polynomial (x — ui)(x — uz):+*(x — Un) when it is 
expanded as a polynomial in x: 


(3.3)ep@): = (« =aa)@ =8a)> & = ui a — ee = 


We have reversed the order of the indices and alternated the sign here. The 
coefficients s; are symmetric because p(x) is symmetric with respect to permutation 
of the indices. 

The main theorem on symmetric functions asserts that the elementary symmet- 
ric functions generate the ring of all symmetric polynomials: 


(3.4) Theorem. Every symmetric polynomial g(u1,..., un) © R[u] can be written 
in a unique way as a polynomial in the elementary symmetric functions s,,...,5,. In 
other words, let z),...,Z, be variables. For each symmetric polynomial g(u), there is 
a unique polynomial ¢ (z1,..., Zn) © R(z1,..., Zn] such that 


Biltbig. <5 Uden ($1500455n). 


The proof of this theorem is at the end of the section. 


For example, 


(35) uy? ap 800 Se he = Sic a 252. 
The discriminant of the polynomial p(x) (3.3), defined to be 
D= (u; os u2)*(uy = U3)? °° (a=; = une 
3.6 
i = []( - w)? = +1 — ), 
ij i#j 


is perhaps the most important symmetric. polynomial. Both of the last two expres- 
sions for the discriminant are convenient at times, so it is unfortunate that they may 
differ by a sign. To go from the second expression for D to the last one requires 
sn(n — 1) sign changes, so the correct sign to replace the symbol + is 


(3.7) (—1)n@-)72 


It is clear that D is a symmetric polynomial with integer coefficients. So Theo- 
rem (3.4) tells us that it can be written as an integer polynomial in the elementary 
symmetric functions. In other words, there exists a polynomial 


(3.8) A(Z1 ++ +520) Gee leueee cal 


so that D = A(s,,...,5,). Unfortunately, this expression for D in terms of the ele- 


mentary symmetric functions is very complicated. I don’t know what it is for 
fie 3: 


We can compute the discriminant for n = 2 easily: 
(3.9) (u; a U2)" = sr as 452. 
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This is the familiar formula for the discriminant of the quadratic polynomial 
p(x) = x" — six + s>. When n = 3, the expression for the discriminant is already 
too complicated to remember: 


(3.10) 
(uy — u2)?(uy — u3)(u2 -- U3)? = $\°522 — 452° — 4,35; — 27532 + 18515253. 


It is important to note that such an expression is an identity in Z[u,,..., Un]. It 
remains true when substitutions are made for the variables u;. If we are given partic- 
ular elements {a,.... an} in a ring R, we can expand the polynomial obtained by 
substituting a; for u; in p(x): 


a Oa ke) aa eX" ob x"! toa 2 eee 
The indices and the signs have been adjusted to agree with (3.3). Then 
Di =. Si Qiz.< nia) 
and 
Il@- aj)? = A(by,..., Dn). 
i<j 


This follows by substitution of a; for u;. 
It is also important that the expression of a symmetric polynomial in terms of 
the elementary symmetric functions is unique: 


(3.11) Corollary. There are no polynomial relations among the elementary sym- 
metric functions s),...,5n. Equivalently, the subring R[s.,...,5,] of R[u] generated 
by {s;} is isomorphic to the polynomial ring R[z:,..., Zn] in n variables. 


This is a restatement of the uniqueness in Theorem (3.4). o 


The corollary can be used in the following way: Let 
(3.12) fixie aie | ao ay 


be a polynomial with coefficients in a ring R. We define the discriminant of f(x) to 
be the element A(qa;,...,@n) of R, where A(z,,...,2n) is the polynomial (3.8). Since 
this polynomial is unique, the discriminant is defined, whether the polynomial is a 
product of linear factors in R[x] or not. 

For example, let n = 3. Then formula (3.10) shows that 


(3.13) A(0, p, -q) = —4p* — 27q?, 
which agrees with the formula (2.15) for the discriminant of the cubic polynomial 
Bc le) aia aa 


We can use undetermined coefficients to compute the expression of a symmet- 
ric polynomial in terms of the elementary symmetric functions. To apply this 
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method, we notice that the elementary symmetric function s; has degree i in the 
variables u. That is why we chose the index i for it. So we assign the weight i to the 
variable z;, and we define the weighted degree of a monomial z)°!22°? +++ z,°" to be 


(3514) e, + 2e2 +o + nen. 


Substitution of s; for z; into a polynomial of weighted degree d in z yields a polyno- 
mial of (ordinary) degree d in u,..., Un. 

For example, to compute the discriminant of a cubic polynomial in terms of 
the elementary symmetric functions, we notice that its degree in u is 6. There are 
seven monomials in z;,z2,23 of weighted degree 6: 


6 4 3 2 ee 3 2 
(3315) Pie Mez 2p zie es Zeit i eae 


So D is a linear combination of these monomials. To compute its coefficients, we 
evaluate D on some special polynomials: Setting fix) = x’(x — 1), we get D = 0, 
s; = 1, and s. = s3; = 0. Since the only one of the monomials (3.15) which does 
not involve z> or z3 is 2;°, the coefficient of z,;° in the discriminant is zero. The 
coefficients of z2* and z;° can be computed using the special polynomials x* — x and 
x’? — 1, for example. 


Proof of Theorem (3.4). Let’s warm up by working out the case of the sym- 
metric polynomial 


f(x) = ura + uyrus + ue?) + U2?u3 + U3?) + Us? UD 


as an example. To analyze it, our first step is to set u; = 0. We obtain a symmetric 
polynomial f° = u;°u2 + uy’ in the remaining variables u,,u.. Let us denote the 
elementary symmetric functions in u;,u2 by 51° = uw, + u2 and s2° = uur. We no- 
tice that f° = 5,°s2°. 

The second step is to compare f with the polynomial s;s2 in three variables. We 
compute the polynomial f — s,:52, where s; = uw + uw + uz and s2 = uu + 
Uy;u3 + U2u3, finding that 

fo — S182 = —3u,u2u3. 


We recognize this polynomial as —3s3. So f = 5,52 — 353. 

The general case is similar. There is nothing to show when n = 1, because 
u; = S, in that case. Proceeding by induction, we assume the theorem proved for 
n — | variables. Given a symmetric polynomial f in u,,..., un, we consider the poly- 
nomial f° obtained by substituting zero for the last variable: f°(u,...,Un—-1) = 
f(u1,...,Un—1,0). We note that f° is a symmetric polynomial in u,,...,un—1. By the 
induction hypothesis, f° may be expressed as a polynomial in the elementary sym- 
metric functions in {u,...,Un—1}, which we denote by 


GO) = 
So ap te” eae coe cee = Uy °*" Un-}. 


So we can write f° = g(s\°,...,5n-1°). Moreover, it follows from the defintion of 
the polynomials s; that 


S\° = Si(,..., nega 0)e. te = 9) nee 
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Consider the polynomial 


TP Meee fd...) — 215) ex, Saas 


as a polynomial in m™j...., u,. Being a difference of symmetric polynomials, this 
polynomial is symmetric. Also, it has the property that p(u,...,Un—1,0) = 0. 
Therefore every monomial occurring in p is divisible by u,. By symmetry, p is divis- 
ible by u; for every i, and hence it is divisible by s,. So 


(3.16) Pipretan) = 2 (Sigs. Sna0) 1 Splittipeeede,): 


for some symmetric polynomial h. We now work on A(u),.... un). By induction on 
the degree, A ts a polynomial in the symmetric functions, and hence so is f. 

It remains to prove the uniqueness of @(s,,...,5,). The uniqueness means that 
there is only one polynomial g(z),...,z,) in the variables z;, such that 
p(S1,...,5n) = f(uy,...,Un), as polynomials in u,,..., Un. In other words, the kernel 
of the substitution map 


a: R[z]—— R{u] 


sending zj~~~» 5; is zero. To show this, suppose 9(51,...,5.) = 0 for some 
g € R[z]. Setting u, = 0 in this expression we still get zero: p(51°,..., Sn—1°, 0) = 
0. By induction on n, this implies that p(z),...,Zn-1,0) = 0. Therefore z, divides 
y(z), and we may write p(z) = znW(z). Then 0 = p(s) = SpW(s) = ures Uns(s). 
Since the product u;:*-un is not a zero-divisor in the polynomial ring R[u], 
w(s) = 0. The polynomial w(z) has lower total degree in z than g(z), so we may ap- 
ply induction on the degree to conclude that uw = 0. Hence g = 0 too. o 


Now suppose that R = F is a field. Then we may also consider the field of ra- 
tional functions in the variables u;, that is, the field of fractions of F[ui,..., un]. The 
symmetric group also acts on this field, and the corresponding assertion is true: 


(3.17) Theorem. Every symmetric rational function is a™ rational function in 
AS BeocS Sn. 


Proof. Let r(u) = f(u)/g(u) be a symmetric rational function, where 
f,g © Flu]. We can build a symmetric function from g by multiplying all the og 


together: 
c= |] og 
aES,, 


is a symmetric polynomial. Then G(u)r(u) is a symmetric rational function, and it is 
also a polynomial in {u;,..., u-}—a symmetric polynomial. By Theorem (3.4), G(u) 
and G(u)r(u) are polynomials in the elementary symmetric functions {s;}. Thus r (u) 
is a rational function in {s;}. o 


The pair of fields 
(3.18) F(s) = F(si,...,52) © Flur,...,Un) = Flu) 
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is an example of a Galois extension. This follows from Theorem (1.11), because 
F(u) is a splitting field of the polynomial p(x) (3.3) and because the roots u1,..., Un 
are distinct. By Proposition (1.14), the Galois group G = G(F(u)/F(s)) operates 
faithfully on the roots. On the other hand, G contains the full symmetric group, by 
construction. Therefore G = S,. As a corollary, we find that [F(u) : F(s)] = n!. 
Needless to say, this can be proved directly. 


4. PRIMITIVE ELEMENTS 


At the end of the first section, we saw that generically chosen elements of a bi- 
quadratic extension K/F generate K. It is possible to derive a general statement of 
this type as a corollary of the Main Theorem of Galois theory. But we are going to 
prove it directly instead, and then use this fact in the proof of the Main Theorem. 


(4.1) Theorem. Existence of a primitive element: Let K be a finite extension of a 
field F of characteristic zero. There is an element y € K such that K = F(y). 


An element y which generates a field extension K/F is called a primitive ele- 
ment for K over F. So the theorem can be restated by saying that every finite exten- 
sion K of a field F has a primitive element. We have restated our general hypothesis 
that F has characteristic zero here because this theorem is not true for fields of char- 
acteristic p. 


Proof of Theorem (4.1). We use induction on the number of generators of K. 
Say that K = F(a,...,@n). If n = 1, there is nothing to prove. For n > 1, the in- 
duction principle allows us to assume the theorem true for the intermediate field 
K, = F(a,...,@n-1). So we may assume that K, is generated by a single element B. 
Then K = Ki(an) = F(B8,a@n). We have to show that this field has a primitive ele- 
ment. We are thereby reduced to the case that n = 2, so that K is generated by two 
elements a, B. 

Let f(x), g(x) be the irreducible polynomials for a,B over F, and let K’ be 
an extension of K in which f and g split completely [Chapter 13 (5.3)]. Call their 
roots a = @,...,@m and B = B.,...,Bn. By Chapter 13 (5.8), the elements a; are 
distinct. 

We are going to show that for most choices of c € F, the linear combination 
y = B + ca generates K. Let us denote the field F(y) by L. It suffices to show that 
a € L, because if so, then B = y — ca will be in L too, and this will imply that 
L = K. The way we show that a is in L is indirect: We determine its irreducible 
polynomial over L. As we know, this is the monic polynomial of least degree in 
L{x] which has @ as a root. 

To begin with, @ is a root of f(x). The trick is to use the polynomial g(x) to 
cook up a second polynomial with the root a, namely h(x) = g(y — cx). Notice 
that h(x) has coefficients in L and that h(a) = 0. If we show that the greatest com- 
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mon divisor of fand h in L[x] is.x — a, then it will follow that -a, being one of the 
coefficients of x — a, is in L. Now the monic greatest common divisor of f and h is 
the same, whether computed in L[x] or in K [x] [Chapter 13 (5.4)]. So we may 
make our computation in K‘[x]. In that ring, f is a product of the linear factors 
xX — a@,, and it suffices to show that none of them divides A, that is, that none of the 
elements aj, except for a = a itself, is a root of h(x). Having gotten this far, the 
rest is just a matter of computing the roots of h. 

Since the roots of g are B;, the roots of h(x) = g(y — cx) are obtained by 
solving the equations 


y — cx = 8B, 


for x. Since y = B + ca, the roots are (y — B;)/c = (B — Bj))/c + a. We want 
these roots to be different from a;, i # 1. This will be so provided that c does not 
take one of the finitely many values 


(4.2) Ss a 
with i,j #1, 1.0 


(4.3) Example. Consider the field K = Q[i, V2]. This field has degree 6 over Q 
[see Chapter 13 (3.5d)]. In the notation of the previous proof, we have B, = i, 
Bo = -i, and a, = V2, a = (V2, as = V2, where £ = e?”3, Condition 
(4.2) becomes 


This condition holds for all c € Q@ except c = 0. Therefore y = i + cW/2 gener- 
ates K over @ for all rational numbers c # 0. Of course, many other combinations 
of the two elements B,a will generate F(8,q@). In this example, the product in/2 
also generates K. o 


Theorem (4.1) is important for two reasons. First, explicit computation in an 
extension of the form F(y) is easy if the irreducible equation for y over F is known. 
Second, since finite extensions have the form F(y), we can derive their properties 
from facts about algebraic elements. It is this aspect which is most important for us. 

The power of Theorem (4.1) is shown by applying it to the study of automor- 
phisms of fields. Consider a finite group G of automorphisms of the field K, and de- 
note its fixed field K® by F. 


(4.4) Proposition. Let G be a finite group of automorphisms of a field K, and let 
F be its fixed field. Let {B:,..., Br} be the orbit of an element B = B, € K under 
the action of G. Then 8 is algebraic over F, its degree over F is r, and its irreducible 


polynomial over F is g(x) = (x — Bi)-+:(« — By). 
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Note that the degree of 6, being the order of an orbit, divides the order of the 
group. 


Proof. Let f(x) be the irreducible polynomial for 8 over F. Since f(x) 1s fixed 
by G, each of the elements £, is a root of f (1.14), and so g divides f. Also, g is 
fixed by all permutations of {8,,. ..8,}. and hence dy the operation of G, which 
permutes the orbit. Therefore ¢(x) € F[x]. Since fis irreducible, g = f.o 


This proposition provides a method for determining the irreducible polynomial 
for an element B of a Galois extension K over F. For example, let K be the bi- 
quadratic extension Q(/, V2), and let B =i + V2. The Galois group of K/Q is the 
Klein four group, and the orbit of 8 consists of the four elements +7 + V2. So the 
irreducible polynomial for B over Q is 


(x —i- V2Vx —i + V2Vx + i - V(x +i + V2) 
= (x2 — 2ix — 3)(x? + 2ix — 3) = x* — 2x? + 9. 


We can also determine this polynomial by computing powers of 6 and finding the 
linear relation of smallest degree between them (see Chapter 13, Section 3). How- 
ever, the method given here is preferable because it always produces an irreducible 
polynomial. 


(4.5) Corollary. Let K/F be a Galois extension, and let g(x) be an irreducible 
polynomial in F [x]. If g has one root in K, then it factors into linear factors in K [x]. 


Proof. According to Corollary (1.9), F is the fixed field of the Galois group 
G = G(K/F). Let B be a root of g(x) in K. By Proposition (4.4), the irreducible 
polynomial for B over F is (x — B,)--: (x — B,), where {B;,..., B,} is the G-orbit of 
B. Since g(x) is the irreducible polynomial for B, it is equal to this product, so it 
factors into linear factors in K, as asserted. o 


The corollary tells us in particular that every Galois extension is a splitting 
field, which is part of Theorem (1.11). For, take any generators a, B,... for K over 
F, and let f(x) be the product of their irreducible polynomials. Then f splits com- 
pletely in K, and hence K is a splitting field for f. 


(4.6) Theorem. Let G be a group of order n of automorphisms of a field K, and 
let F be its fixed field. Then [K : F] = n. 


Proof. Proposition (4.4) shows that every element B of K is algebraic over F 
and that its degree divides n = |G|. The theorem of the primitive element implies 
that the degree of the whole field extension K/F is bounded by n too. To see this, we 
form a chain of extension fields as follows: We choose an element a; © K which is 
not in F, and we set F; = F(a,). Then [F, : F] <n. If F, # K, we choose an ele- 
ment a € K which is not in F,, and we set F; = F(a,,a2). By the theorem of the 
primitive element, F, is generated by a single element y, and by Corollary (3.6) of 
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Chapter 13, the degree of y over F is bounded by x. So [F2: F] <= n. Continuing in 
this way, we obtain a chain F < F, < F>... in which [F; : F] Sn for all i. This 
chain must be finite. So F, = K for some i, and[{K : F] <n. 

Applying Theorem (4.1) once more, we conclude that K has a primitive ele- 
ment: K = F(®). Any element of G which fixes B acts as the identity on 
K = F(B). Since we are assuming that G is a group of automorphisms of K, the 
identity is the only such element. Therefore the stabilizer of B is {1}. and the orbit 
has order n. By Proposition (4.4), B has degree n over F, and[K: F] =n. 9 


Using the theorem we have just proved, we can derive the first theorem, Theo- 
rem (1.6), which was stated in Section |. That theorem says that for any finite ex- 
tension K/F, the order of its Galois group divides its degree. To prove this, we set 
G = G(K/F). Then G operates on K, so by Theorem (4.6), |G| = [K : K°]. And 
since F C K° C K,[K: K®] divides [K: F].c 

Theorem (4.6) also provides us with a converse to Corollary (1.9): 


(4.7) Corollary. Let G be a finite group of automorphisms of a field K, and let F 
be its fixed field. Then K is a Galois extension of F, and its Galois group is G. 


Proof. By definition of the fixed field, the elements of G are F-automorphisms 
of K. Hence G C G(K/F). Since |G(K/F)| = [K: F] and [K: F] = |G], it fol- 
lows that |G(K/F)| = [K : F] and that G = G(K/F). 


We can get some interesting examples to illustrate Proposition (4.4) and Theo- 
rem (4.6) by considering automorphisms of the field C(y) = K of rational functions 
in y. For instance, let o be the automorphism defined by y~~> iy ', and let G de- 
note the cyclic group of order 4 generated by o. 


(4.8) Proposition. Let K and G be as above. The fixed field F = K © is the field 
C(w) of rational functions in w = y? — y’’. 


In other words, every rational function f(y) which is fixed by o can be expressed as 
a rational function in w. 


Proof. First of all, o does fix w = y? — y”’, so wis in the fixed field. There- 
fore the fixed field F contains the field C(w). Next, we compute the irreducible poly- 
nomial for y over F. The orbit of y is {y, iy’, ~y, -iy '}, so Proposition (4.4) tells us 
that the irreducible equation for y is (x — y)(x — iy')(x + y)\x + iy") = 
x* — wx? — 1. This polynomial has coefficients in C(w), so y has degree 4 over that 
field. It follows that [K : C(w)] = 4. On the other hand, C(w) C F C K, and since 
|G| = 4, Theorem (4.6) tells us that [K : F] = 4. Counting degrees shows that 
C(w) = F.o 


A famous theorem called Liuroth’s theorem asserts that any subfield of the field 
C(y) which properly contains the complex numbers is the field of rational functions 
in some rational function w of y. 


556 Galois Theory Chapter 14 


5. PROOF OF THE MAIN THEOREM 


Let f(x) be a monic polynomial of degree n with coefficients in a field F. We recall 
that a splitting field of f(x) € F[x] is a field of the form K = F(a,...,@n), such 
that f(x) = (x — a,)---(x — an) in K[x]. The existence of a splitting field was 
proved in Chapter 13 (5.3). We now want to show that any two splitting fields of a 
given polynomial f(x) are isomorphic. This follows from the fact that a field exten- 
sion of the form F(q@) is determined by the irreducible polynomial for a over F, and 
from some “bookkeeping.” The bookkeeping required for the proof is notationally a 
little confusing, but not difficult. 

Any isomorphism gy: F——>F of fields extends to an isomorphism 
F[x]—— F{x] between the polynomial rings by 


GX” Gyan Ot tbe 


where a; = (ai). Let us denote the image of f(x) by f(x). Since y is an isomor- 
phism, f(x) will be an irreducible polynomial if and only if f(x) is irreducible. 
The following lemma generalizes Chapter 13 (2.9). 


(5.1) Lemma. With the above notation, let f(x) be an irreducible polynomial in 
- F [x]. Let a be a root of f(x) in an extension field K of F, and let & be a root of f(x) 
in an extension K of F. There is a unique isomorphism 


gi: F(a) —> F(a) 
which restricts to g on the subfield F, and which sends a@ to @. 


Proof. We know that F(a) is isomorphic to the quotient F[x]/(f), and simi- 
larly F(a) is isomorphic to F [x]/(f). The rings F [x] and F[x] are isomorphic, as 
we just saw, and since f and a correspond under this isomorphism, so do the ideals 
(f) and (f) which they generate. Therefore the residue rings F[x]/(f) and F[x]/(f) 
are also isomorphic. Combining these isomorphisms yields the required isomor- 
phism ¢,. This extension of ¢ is unique because @ generates F(a) over F. o 


(5.2) Proposition. Let y: F——>F be an isomorphism of fields. Let f(x) be a 
nonconstant polynomial in F[x], and let f(x) be the corresponding polynomial in 


F[x]. Let K and K be splitting fields for f(x) and f(x). There is an isomorphism 
w: K—— K which restricts to g on the subfield F of K. 


If we let F = F and ¢ = identity, we obtain the following corollary: 
(5.3) Corollary. Any two splitting fields of f(x) © F[x] over F are isomorphic. o 


The corollary is the result we are really after. The auxiliary isomorphism ¢ is intro- 
duced into the proposition to make the induction step of the proof work. 
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Proof of Proposition (5.2). If f(x) factors into linear factors over F, then f(x) 
also factors into linear factors. In this case K = F and K = F, so gy = w. Assume 
that f does not split completely. Choose an irreducible factor g(x) of f(x) of degree 
>1. The corresponding polynomial g(x) will be an irreducible factor of f(x). Let a@ 
be a root of g in K and write F,; = F(a). Make a similar choice of & and F, = F(a) 
in K. Then by Lemma (5.1), we can extend y to an isomorphism g,: Fk —> F, 
which sends a~~~ a. Being a splitting field for f over F, K is also a splitting field of 
f over the larger field F,, and similarly K is a splitting field for f over F,. Therefore 
we may replace F,F,y by F,,Fi.g; and proceed by induction on the degree of K 
over F.o 


We are now in a position to prove the second of the theorems, Theorem 
(1.11), which was announced in Section 1. One part of this theorem was proved in 
the last section, using Corollary (4.5). For convenience, we restate the other part 
here. 


Theorem. Let K be the splitting field of a polynomial f(x) € F[x]. Then K is a 
Galois extension of F; that is, |G(K/F)| = [K: F]. 


We will prove the theorem by going back over the proof of Proposition (5.2), keep- 
ing careful track of the number of choices. 


(5.4) Lemma. With the notation of (5.2), the number of isomorphisms 
yw: K ——> K extending ¢ is equal to the degree [K : F]. 


The theorem follows from this lemma if we set F = F, K = K, and g = identity. o 


Proof of Lemma (5.4). We proceed as in the proof of Proposition (5.2), choos- 
ing an irreducible factor g(x) of f(x) and one of the roots @ of g(x) in K. Let 
F, = F(a). Any isomorphism y: K——>K extending g will send F, to some 
subfield F', of K. This field K will have the form F(a), where @ = (a) is a root of 
&(x) in K. 

Conversely, to extend ¢ to , we may start by choosing any root @ of g(x) in 
K. We then extend ¢ to a map g,: F. —>F;, = F(a) by setting g(a) = &. We use 
induction on [K : F]. Since [K : F,\] < [K : F], the induction hypothesis tells us that 
for this particular choice of ¢,, there are [K: Fi] extensions of g, to an isomor- 
phism w: K——> K. On the other hand, g has distinct roots in K because g and g are 
irreducible [Chapter 13 (5.8)]. So the number of choices for a@ is the degree of g, 
which is [F; : F]. There are [F; : F] choices for the isomorphism ¢;. This gives us a 


total of [K : F,|[F: : F] = [K : F] extensions of g to #: K——> K. o 


Since any two splitting fields K of a polynomial f(x) € F[x] are isomorphic, 
the Galois group G(K/F) depends, up to isomorphism, only on f. It is often referred 
to as the Galois group of the polynomial over F. 
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The following corollary collects together several criteria for an extension to be 
Galois. Most of them have already been proved, and we leave the remaining proofs 
as Exercises. 


(5.5) Corollary. Let K/F be a finite field extension. The following are equivalent: 


(i) K is a Galois extension of F; 
(ii) K is the splitting field of an irreducible polynomial f(x) © F [x]; 
(ii’) K is the splitting field of a polynomial f(x) € F[x]; 
(iii) F is the fixed field for the action of the Galois group G(K/F) on K; 
(iii) F is the fixed field for an action of a finite group of automorphisms of K. c 


We now have enough information to prove the Main Theorem of Galois the- 
ory, which relates intermediate fields to subgroups of the Galois group. 


Proof of Theorem (1.15). Let K/F be a Galois extension. We have to show 
that the maps 


LaweG(K/L) and Hows K" 


are inverse functions between the set of intermediate fields and the set of subgroups 
of G = G(K/F). To do so, we verify that the composition of these two maps in ei- 
ther direction is the identity. 

Let L be an intermediate field. The corresponding subgroup of G is 
H = G(K/L). By definition, H acts trivially on L, soL C K”. On the other hand, 
K is a Galois extension of L by (1.13); hence [K : L] = |H |. By Theorem (4.6), 
|H|=[K: K"], soL = K®, 

In the other direction, suppose that we start with a subgroup H C G, and 
let L = K". Then H C G(K/L). But |H| = [K: K"] =[K:L] =|G(K/L)|. 
Therefore H = G(K/L). This shows that the two maps are inverses, as required. 
Since K is a Galois extension of L = K",[K:L]=|H|, and(L: F]=[G: A]. 9 


The correspondence given by the Main Theorem has some surrounding details 
which we will now discuss. First of all, the correspondence between fields and sub- 
groups is order reversing, that is, if L,L’ are two intermediate fields and if 
H = G(K/L), H' = G(K/L’) are the corresponding subgroups, then L C L’ if 
and only if H D H’. This is clear from the definitions of the maps and is consistent 
with the relations (1.16). 

To complete the picture, we will show that the immediate fields L which are 
Galois extensions of F correspond to the normal subgroups of G. Let L be an inter- 
mediate field. An F-automorphism ao of K will carry L to some intermediate field oL 
which may or may not be the same as L. We call aL a conjugate subfield. 


(5.6) Theorem. Let K/F be a Galois extension, and let L be an intermediate field. 
Let H = G(K/L) be the corresponding subgroup of G = G(K/F). 
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(a) Let o be an element of G. The subgroup of G which corresponds to the conju- 
gate subfield of is the conjugate subgroup oHo '. In other words, 
G(K/aL) = oHa '. 

(b) L is a Galois extension of F it and only if H is a normal subgroup of G. When 
this is so, then G(L/F) is isomorphic to the quotient group G/H: 


(5.7) Diagram. 


K) H=G(K/L) 
operates on K, 
G = G(K/F) L fixing L 
operates on K 
fixing F If H is normal, 
then G/H = G(L/F) 
F) operates here 


(5.8) Example. In the case of the cubic equation (2.1) whose splitting field has de- 
gree 6, the only intermediate extension which is Galois, other than F and K, is F(6), 
which corresponds to the alternating group H = A; C $3. The Galois group 
G(F(6)/F) is cyclic of order 2, as is the quotient group S3/A;. The three fields 
F(q,) are conjugate. This agrees with the fact that the three subgroups of S; of order 
2 are conjugate. 


Proof of Theorem (5.7). (a) Let oL = L'. If 7 is an element of 
H = G(K/L), then ota”! is in H' = G(K/L'). To check this, we must show that 
oro ' fixes any element a’ € L’. By definition of oL, a’ = a(a) for some 
a EL. Then ota \(a') =‘or(a) = o(a) = a’, as required. It follows that 
H' D oHa'' and by symmetry, or by counting elements, that H’ = cHa'. The 
fact which we have just checked is actually a general property of group actions on 
sets [Chapter 5 (6.4)]. 


(b) Now suppose that H is normal. Then H = oHo™' for all o © G; hence 
G(K/L) = G(K/coL). This implies that L = oL for all o [see (1.9)]. Thus every F- 
automorphism of K carries L to itself and hence defines an F-automorphism of L by 
restriction. This restriction defines a homomorphism 


(5.9) jw: G—>G(L/F), 
Its kernel is the set of @ © G which induces the identity on L, which is H. There- 


fore G/H is isomorphic to a subgroup of G(L/F). Counting degrees and orders, we 
find 


[L: F] = |G/H| = |G(L/F)|. 


It follows that L is a Galois extension and that G/H ~ G(L/F). 
Conversely, suppose that L/F is Galois. Then L is a splitting field of some 


polynomial g(x) € F[x]; that is, L = F(B1,..., Bx), where B; are the roots of g(x) 
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in K. An F-automorphism o of K permutes these roots and therefore carries L to it- 
self: L = oL. By (a), H = oHo'; thus H is a normal subgroup. o 


6. QUARTIC EQUATIONS 


Let K/F be a Galois extension. We have seen that if B is an element of K whose 
monic irreducible polynomial over F is g(x), then g splits completely in K, and the 
G-orbit of B is the set of roots of g (4.4). So G operates transitively on the roots of 
an irreducible polynomial g © F [x], provided that this polynomial has at least one 
root in K. Combining this observation with Proposition (1.14), we find: 


(6.1) Proposition. Let K/F be a splitting field of a polynomial f(x) © F [x]. The 
Galois group G of K/F operates faithfully on the set {a1,...,a@n} of roots of f. Hence 
this operation represents G as a subgroup of the symmetric group S,. The roots form 
a single orbit if and only if f is irreducible over F. 5 


When the Galois extension K is exhibited as the splitting field of a polynomial of de- 
gree n, it is customary to view the Galois group G as a subgroup of the symmetric 
group S,. If the polynomial f is irreducible, then it is a transitive subgroup, which 
means that it acts transitively on the indices {1,...,m}. However, the same Galois ex- 
tension K/F can be exhibited as a splitting field of many polynomials, so this repre- 
sentation of G as a subgroup of S, is not unique. 

For instance, let K/F be the splitting field of an irreducible cubic equation such 
that [K : F] = 6. Then the Galois group is represented as the whole symmetric 
group S3. However, the theorem of the primitive element tells us that K can also be 
generated by a single element y. Since [K : F] = 6, y has degree 6 over F. This 
means that its orbit has order 6 and that its irreducible polynomial has degree 6. So 
if we think of K as the splitting field of this sextic polynomial, the Galois group is 
represented as a subgroup of Se. This isn’t a very economical way to represent $3. 

Let us suppose that our Galois extension K is the splitting field of a polynomial 
f(x) and that its roots in K are a,...,@,. Then, viewing G as a subgroup of S,, we 
may pose the following two problems: 


(6.2) (i) Given a subgroup 2 of S,, decide if G C 2. 
(ii) Determine G. 


If we could solve (i) for every subgroup #, then (ii) would also be solved. 
Lagrange’s approach to these problems is to look for functions of the roots 
which are partially symmetric. A partially symmetric polynomial is a polynomial 


Plu y-++) Un) in the variables {u,,...,un} which is left fixed by the permutations in a 
given subgroup 3 of S, but not by any other permutations. For example, we saw in 
(2.13) that 


(u ae U2)(ur os u3)(u2 = us) 
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Is a partially symmetric function for the alternating group. when n = 3. There is no 
difficulty in generalizing this construction to arbitrary n by defining 
(6.3) 5 (u) = (uy — w2)(ur — ua) ++ (n-1 — Un) = [](ui = uj). 

i<j 
This element is a square root of the discriminant (3.6). The effect of a permutation 
ot the indices is to multiply 6 by the sign of the permutation. Having this partially 
symmetric function in hand, we substitute the roots a,,...,@, of our polynomial into 
it, to obtain an element d(a) = 6 of K which is fixed by even permutations of the 
roots. We can decide whether or not 6 is in F by determining whether or not the dis- 
criminant D is a square. This will provide information about the Galois group. 


(6.4) Proposition. Let K/F be a Galois extension which is the splitting field of an 
irreducible polynomial f(x) € F Lx] of degree n. Let a1,...,a@n be the roots of f(x) in 
K, and let 6 = 6(a). Then 6 # 0. Moreover: 


(a) 6 € F if and only if the Galois group G is a subgroup of the alternating group 
ae 

(b) In any case, the subgroup G(K/F(5)) of G is contained in the alternating 
group. 

Proof. The case 6 = 0 occurs only if two of the roots are equal, and this can 
not happen if f is irreducible [Chapter 13 (5.8)]. Next, assume that 6 is in F. Since 
odd permutations send 6~~~ -6 and since 6 # 0, odd permutations don’t fix 6. On 
the other hand, the elements of F are fixed by every automorphism in G. It follows 
that G does not contain any odd permutations, hence that G C An. Conversely, if 
6 €& F, we use the fact that K° = F. There must be an element of G which doesn’t 
fix 6. This element will be an odd permutation, soG CZ A,. This proves (a). Part (b) 
follows from (a) when we replace F by F(8). a 


We will now discuss quartic equations, beginning with an interesting special 
case which is controlled by the discriminant. We consider a complex number which 
is presented as a nested square root, say a = V r+sVt, where r, 5,t are in a field 
F. The numbers 


(6.5) W502. V/Sean1, V 7925, V5H2VS 


are a few samples. We ask the following question: Is there an expression for @ in 
terms of two square roots which are not nested? 
Since a? = r + s Vt, it is easy to write down a quartic polynomial which has 


a as a root, namely 
6.6) f(x) =? -(@ + sVd)(X? - (r — sV 2) = x4 + bx? +, 


where b = -2rand c = r* — s*t. If a’ denotes one of the two square roots of 
r — sV‘1t, then the roots of this quartic are 


, 


(6.7) a,a',-a,-a'. 
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The splitting field K = F(a,q@') of f can be reached by the sequence Vt.a,a' of 
three square root adjunctions, so the degree [K : F] divides 8. The degree will be 
less than 8 if one of the square root adjunctions is unnecessary. 

We must decide whether or not f is irreducible. To do so, we first check the ir- 
reducibility of the quadratic polynomial q(y) = y* + by + © whose roots are 
a’,a’. If g is irreducible, then f doesn’t have a root in F. In that case f, if re- 
ducible, will be the product of two quadratic polynomials. Computing with undeter- 
mined coefficients, we find that the product must have the form 


(6.8) x? + bx? + = (x? + ux + v(x? — ux + tv). 


We will be able to determine whether or not such a factorization exists, at least when 
F=Q. 

If f(x) is reducible, then @ is a root of a quadratic polynomial, so it can be 
written using only one square root. This happens with V 3+2V2 for example, 
which is equal to 1 + V2, as you will check by squaring both expressions. The 
quartics derived from the other examples (6.5) are irreducible over Q. 

We now return to our question. Let’s suppose that f is irreducible. Notice that 
to write @ in terms of unnested square roots Vp.V4q amounts to finding a bi- 
quadratic extension K = F(Vp, Vq) of F which contains a. Suppose that a bi- 
quadratic extension K which contains a can be found. Then K is a Galois extension 
of F, so f(x) factors into linear factors in K. This means that K contains a splitting 
field of f. In fact, K will be the splitting field, because f is irreducible and of degree 
4. So the Galois group G of f will be the Klein four group. If G is not the Klein four 
group, then a can not be written in terms of unnested square roots. 

Conversely, if K/F is a Galgis extension whose Galois group is the Klein four 
group, then K contains three intermediate fields of degree 2 over F. Any two of 
these fields taken together generate, K. So K is a biquadratic extension of F, and any 
element of K can be written in terms of two unnested square roots. 

We compute the discriminant of f(x), using the list (6.7) of roots. 


D = [|(a; — aj)? = (4aa'P(a — a')(a + a')* = 2b? — 4c)*e 
i<j 
= 2's4t?(r? — 572). 
If D is a square in F, then G is a transitive subgroup of the alternating group A, 
whose order divides 8. The Klein four group is the only such group. It consists of the 
even permutations of order 2: 
(6.9) V = {(1), (12)(34), (13)(24), (1.4)(23)}. 


There is no other transitive operation of V on {1, 2,3, 4}. So we find: 
(6.10) Proposition. Let a = Vr+sV1t, with r,s,t € F, and assume that 


f(x) = x* — 2rx? + (r? — st) is irreducible over F. Then a can be written in 
terms of two unnested square roots if and only if r* — s*t is a square in F. o 
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If a = V5+V21, then r? — 5?f = 25 — 21 = 4, which is a square. In the 
last two examples (6.5), r> — st is not a square in Q. 

Let us determine the unnested expression for a = V5+V21 explicitly. 
Galois theory provides the clue; namely it suggests determining the intermediate 
fields. They are quadratic extensions of Q, so they are generated by square roots. 
These square roots are the ones we_need to express a. One intermediate quadratic 
extension is obvious, namely Q(V21). But this isn’t the one we need. To find an- 
other intermediate extension, we determine the fixed field of the subgroup H of or- 
der 2 which is generated by o = (12)(34). If the roots of f are listed in the order 


(6.7), the H-orbit of @ is {a,a'}, (where a’ = V5 — V21, and the irreducible 
polynomial for a over K” is (x — a)(x — a’) = x? — (a + a')x + aa’. So K has 
degree 2 over the field L = F(a + a',aa'), and this field is contained in K”. A 
consideration of degrees shows that L = K”. With this clue, we compute, finding 
aa’ = 2, (a +a’) = 14, andat+a’ = VI14. Similarly, a — a’ = V6. We 
solve for a, obtaining a = L(V6 + 14). 6 


It is harder to analyze a general quartic equation, and the roots can usually not 
be written explicitly in a useful way. However, there is another partially symmetric 
function which helps to determine the Galois group. Let f(x) be an irreducible quar- 
tic polynomial with roots {a,,a2,a3,@4} in a splitting field K. Then by Proposition 
(6.1), its Galois group is a subgroup of S,, and the roots form one orbit. The transi- 
tive subgroups of S$, are 


(6.11) See ig. Da Cas. V, 


where V is the group (6.9). Actually, there are three conjugate subgroups isomor- 
phic to D4 and three conjugate subgroups isomorphic to C,. The other subgroups are 
uniquely determined. There are some other subgroups of S, which are isomorphic to 
the Klein four group, but they are not transitive. 

Let us ask for partially symmetric functions of the roots to distinguish these 
groups. As we have seen, the element 6 determines whether or not G C A,. The 
subgroups of A, in our list are A, and V. So 6 € F if and only if G is one of these 


two groups. 
Next, we consider the partially symmetric polynomial 
(6.12) Bi(u) = uyu3 + uu. 


A permutation of the indices carries B\(u) to one of the three polynomials f;(u), 
i= 1,2,3, where 

B2(u) = uiu2 + u3u, and B3(u) = uu, + UUs. 
Since S, has order 24, the stabilizer of B,(u) is of order 8; it is one of the three dihe- 
dral groups D,. The polynomial (x — Bi (u))(x — B2(u))(x — B3(u)) is left fixed by 


all permutations of the variables u;, so its coefficients are symmetric functions. They 
can be computed explicitly in terms of the elementary symmetric functions. 
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Going back to our quartic polynomial, we substitute the roots a; into B;(u), to 
obtain three elements B;(a) = B; © K. They form one orbit under the action of the 
symmetric group on the roots. If they are distinct elements of K, then the stabilizer 
of B, in S, will have order 8, so it will be the dihedral group Ds. We are lucky: The 
Bj are distinct. For example, 


Bi = B> = A403 + Q2Q4 — A,A2 — A304 = (a, a as)(a3 a Q@). 


Since we have assumed that f is irreducible, its roots a; are distinct. The right side of 
this equation shows that B, — B2 # 0. 

Since the Galois group G permutes the elements 6,;, the polynomial 
g(x) = (x-B,)(x—B>2)(x—Bs) has coefficients in F. It is called the resolvent cubic of 
the quartic polynomial f(x). 

Though the symmetric group acts transitively on {B,, B2, 83}, the Galois group 
G, which is a subgroup of $4, may not act transitively. Whether or not it does pro- 
vides information about G. If G fixes 8, for example, then G is contained in the sta- 
bilizer Dz of B,. In this case B; will be in the field F (1.9), so the resolvent cubic 
will have a root in F. Proceeding as in the proof of Proposition (6.4), we find the 
following: 


(6.13) Proposition. Let g(x) be the resolvent cubic of an irreducible quartic poly- 
nomial f(x), and let K be a splitting field of f. Then g(x) has a root in F if and only 
if the Galois group G = G(K/F) is a subgroup of one of the dihedral groups D,. In 
any case, if B is a root of g(x) in K, then the Galois group G(K/F(B)) is a subgroup 
of a dihedral group Dz. 5 


Thus the polynomials x* — D, where D is the discriminant, and the resolvent 
cubic g(x) nearly suffice to describe the Galois group. The results are summed up in 
this table: 


(6.14) Table. 
D a square in F D not a square 
g reducible G=V G = Dz or Cy 
g irreducible G=A, G=S8, 


Explicit computation for arbitrary quartic equations becomes unpleasant, but 
we can easily calculate the discriminant of a quartic which has the form 


(6.15) x Aer ees. 


The discriminant is a symmetric polynomial of degree 12 and therefore has weighted 
degree 12 in the elementary symmetric functions s,,...,54. Substituting (0,0, —r, s) 
for (51, 52, 53,54) into the unknown formula for the discriminant will kill any mono- 
mial involving s; or s.. And the only monomials of weighted degree 12 which do not 
involve s, and s2 are 53* and sq°. Thus the discriminant of (6.15) has the form 
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bp = A(0,0, -r,s) = cr* + c’s?. 


We can determine the coefficients c,c’ by computing the discriminant of two partic- 
ular polynomials. The answer is 


(6.16) D=—2ir- + 2565*, 
For example, the discriminant of 
(6.17) Wie) 2 Ox IZ 


is. 3* - 2'*. This is a square in Q. The Galois group of the splitting field of (6.17) 
over @ is therefore a subgroup of Ag. 

To calculate the resolvent cubic g(x) of the polynomial (6.15), we write the re- 
solvent cubic for the general polynomial whose roots are u,,..., Us as 


2) = bk Ox — Ds. 


then since 8, is a quadratic function in {u;}, b; has degree 2i in {uj} and weighted de- 
gree 2i in the symmetric functions. Proceeding as above, one finds 


(6.18) FWD as ee eS oy 


The resolvent cubic of the particular quartic polynomial (6.17) is x? — 48x — 64. 
The quartic (6.17) and its resolvent cubic are both irreducible over Q. It follows that 
G = A, for the polynomial (6.17). 


7, KUMMER EXTENSIONS 


Let us now consider the splitting field over a field F of a polynomial of the form 
(7.1) {va — a, 


where p is a prime. We will assume that the base field F is a subfield of C which 
contains the primitive pth root of unity ¢, = e27/P_ The complex roots of f(x) are 
the pth roots of a, and if a denotes a particular pth root, then the roots of f(x) are 


(7.2) ated ae a, 
where ¢ = Z,. Therefore the splitting field is generated by a single root: K = F(a). 


(7.3) Proposition. Let F be a subfield of C which contains the pth root of unity 
¢,, and let a be an element of F which is not a pth power in F. Then the splitting 
field of f(x) = x? — ahas degree p over F, and its Galois group is a cyclic group of 
order p. 


Proof. Let K be a splitting field of f, and let a be one of its roots in K. As- 
sume that a is not in F. Then there is an automorphism o of K/F which does not fix 
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a. Since the roots of f are £'a, i = 0,. — 1,.o(a) = f’a for some v # 0. We 
Ae compute the powers of co. peameteaian that o is an automorphism and that 

(£) = ¢ because £ € F, we find o(a) = o(f’a) = f’a (a) = f’’a. Similarly, 
! (a) = ¢'"@ for each i. Since ¢ is a pth root of unity, the smallest positive power of 
a which fixes @ is 0”. Hence the order of o in the Galois group is at least p. On the 
other hand, a generates K over F, and a is a root of the polynomial x? — a of de- 
gree p, so[K : F] S p. This shows at the same time that [K : F] = p, that x? — a 
is irreducible over F, and that G(K/F) is cyclic of order p. a 


Here is a striking converse to Proposition (7.3): 


(7.4) Theorem. Let F be a subfield of C which contains the pth root of unity ¢, 
and let K/F be a Galois extension of degree p. Then K is obtained by adjoining a 
pth root to F. 


Extensions of this type are often called Kummer extensions. For p = 2, the theorem 
reduces to a familiar assertion: Every extension of degree 2 can be obtained by ad- 
joining a square root. But suppose that p = 3 and that F contains ¢,. If the discrimi- 
nant of the irreducible cubic polynomial (2.3) is a square in F,, then the splitting field 
of f has degree 3 [see (2.16)], so its Galois group pe a cyclic group. Therefore the 
splitting field of such a polynomial has the form F(¥ Ya), for some a € F. This isn’t 
obvious. 


Proof of Theorem (7.4). The Galois group G has prime order p = [K : F], so 
it is a cyclic group. Any element o,, not the identity, will generate it. Let us view K 
as an F-vector space. Then a is a linear operator on K. For, since o is an F-automor- 
phism, 


a(a + B) =a(a)+a(B) and o(ca) = a(c)a(a) = co(a), 


for allc € F anda,B € K. Since G is a cyclic group of order p, a? = 1. An ei- 
genvalue A for this operator must satisfy the relation A’? = 1, which means that A is a 
power of ¢. By hypothesis, these eigenvalues are in the field F. Moreover, there is at 
least one eigenvalue different from |. This is a fact about any linear operator T such 
that some power of T is the identity, because such a linear operator can be diagonal- 
ized [Chapter 9 (2.3)]. Its eigenvalues are the entries of the diagonal matrix A which 
represents it. If 7 is not the identity, as is the case here, then A # /, so some diago- 
nal entry is different from 1. 

We choose an eigenvector a with an eigenvalue £' # |. Then a(a) = Z'a@ 
and hence a (a?) = a(al = (C'a)? = fa? = a’. Soa fixes a. Since o gener- 
ates G, the element a” is in the fixed field K°, which is F (1.9). We have therefore 
found an element a € K whose pth power is in F. Since a (a) # a, the element a 
is not in F itself. Since [K : F] is prime, @ generates K. o 


(7.5) Example. Consider the cyclic cubic polynomial (2.12) x3 — 3x + 1. Let 
{ni, 2,73} denote its roots. There is an element o © G(K/F) acting as a cyclic 
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permutation. We choose the basis (1, 7, 2) for K over F = Q(é3). (Why is it a ba- 
sis?) With respect to this basis, the matrix of the linear operator o is 


POG. 0 
a= 10 0 =F], 
a 


because o (1) = 1, o(m) = m, a(m) = 4s = —m — Mm. The vector (0, 1, —23)' 
is an eigenvector with eigenvalue 3. Thus if a = 9 — £372, then a? is an element 
of F, and y generates the splitting field of x* — 3x + 1 over F. We can compute a? 
explicitly, using the fact that m, = go + &* and m = &* + ’. Noting that 
4; = G*, we find a = 4° — G& and a? = 3(1 — G). 0 


(7.6) Example. Let f(x) be an arbitrary irreducible cubic polynomial over a field F, 
and let K be a splitting field of f(x)(x* — 1) over F. Let L C K be the intermediate 
field generated by ¢ and 6 = VD, where D is the discriminant of f. Then [L : F] di- 
vides 4, and [K : L] = 3, by (2.16). The four elements {1, VD, V—3, V(—3bD)} 
span L as F-vector space in any case. By Theorem (7.4), K = L(W/b), for some 
b € L. Therefore the roots of f(x) admit some expression in terms of a cube root of 


the form 
We + oVD + o3V-3 + c4V-3D, with c; € F.o 


8 CYCLOTOMIC EXTENSIONS 

The subfield K of the complex numbers which is generated over Q by ¢, = ails 
called a cyclotomic field. Also, for any subfield F of C, the field F(Z,) is called a 
cyclotomic extension of F. It is the splitting field over F of the polynomial 


(8.1) |, 


If we denote ¢, by ¢, the roots of this polynomial are the powers of ¢, the nth roots 
of unity 1,¢,¢°,...,¢” |. We will concentrate on the case that n is a prime integer p 
different from 2 in this section. 

The polynomial x?~' + --- + x + 1 is irreducible over Q, and ¢ = @, is one 
of its roots [Chapter 11 (4.6)]. So it is the irreducible polynomial for £ over Q. Its 
roots are the powers £, £”,...,¢?~'. Hence the Galois group of Q(¢) over Q has or- 
denp — 1. 


(8.2) Proposition. Let p be a prime integer, and let ¢ = gp. 


(a) The Galois group of Q(Z) over Q is isomorphic to the multiplicative group F,* 
of nonzero elements of the prime field F,. It is-a cyclic group of order p — 1. 


(b) For any subfield F of C, the Galois group of F(¢) over F is a cyclic group. 
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Proof. Let G be the Galois group of F(g) over F. We define a map 
v: G—> F,” as follows: Let a © G be an automorphism. It will carry ¢ to another 
root of the polynomial x”? + --- + x + 1, say to ¢‘. The exponent / is determined as 
an integer modulo p, because ¢ has multiplicative order p. We set v(a) = i. Let us 
verify that v is multiplicative: If 7 is another element of G such that v(t) = J, that 
is, T(Z) = £', then 


(8.3) ot (f) = o(f/) = a(f)! = £4. 


Also, the identity automorphism sends ¢ to ¢, and hence v(1) = 1. Since t is com- 
patible with multiplication and v(a) # 0, v is a homomorphism to F,*. The homo- 
morphism is injective because, since ¢ generates K, the action of an automorphism is 
determined when we know its action on £. Thus G is isomorphic to its image in F,”. 
Since F,* is a cyclic group, so is every subgroup. Therefore G is cyclic. If F = Q, 
then |G| = |F,*| = p — 1, so these two groups are isomorphic. o 


Suppose that F = Q. Then being cyclic and of order p — 1, the Galois group 
G of K = Q(é,) has exactly one subgroup of order k for each integer k which divides 
p — 1. If (p — 1)/k = rand if a is a generator for G, then the subgroup of order k 
is generated by a’. So by the Main Theorem of Galois theory, there will be exactly 
one intermediate field L with [L:Q] = r. These fields are generated by certain 
sums of powers of = Z,. We will illustrate this by some simple examples. 

The simplest case is p = 5. Then [K: Q] = 4, and there is an intermediate 
field of degree 2 over Q. It is generated by » = ¢ + £* = 2 cos 27/5. Since 
2 cos 27/5 = 4(-1 + \V/5), the intermediate field is the quadratic number field 
Q(V5). 


(8.4) Proposition. The subfield L of K = Q(Z,) whose degree over Q is $(p — 1) 
is generated over @ by the element n = ¢ + £?"' = 2 cos 2m/p. Moreover, 
L=KQR. 


Since L = K ™ R, L is also called the real subfield of K. 


Proof. Notice that ¢ is a root of the quadratic equation x* — nx + 1, which 
has coefficients in Q(7). Therefore [K: Q()] < 2. On the other hand, 7 is a real 
number, while ¢ is not real, so Q(n) < K. It follows that [K : Q(m)] = 2, that 
Q(n) = K OR, and that [Q(n) : Q] = 4(p - 1). 0 


When p = 7, n = { + €° has degree 3 over Q. Its irreducible polynomial 
over € can be computed by a method which we have used before (2.12). We guess 
that the other roots are y2 = £? + £° and ns = ¢* + £*. These are the other sums 
of a pth root and its inverse. It is not hard to show that {7 72. 73} is the G-orbit of 
7 =m, so this guess can be justified formally. We expand 
(x — m)(x — m2)(x — 73) and use the relation (° + --» + £+1=0, obtaining 
the irreducible equation x* + x* — 2x — 1 for 7n over Q. 
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The cyclotomic field Q(¢;) also contains a quadratic extension of Q. It is gen- 
erated bye = ¢ + £° + (*. If we sete’ = 0° + 6° + £°, then (x — e)(x — €') = 
x° + x + 2 is its irreducible equation. The discriminant of this polynomial is -7, so 
Q(e) = Q(V-7). It follows that Q(Z,) contains V-7. 

Suppose that p = 17. Then [Q(g) : Q] = 16. A cyclic group of order 16 con- 
tains a chain of subgroups Cy. D Cs D C4 D C2 D Ci. By the Main Theorem of 
Galois theory, there is a corresponding chain of intermediate fields Q C F, C 
F.C F; C Q(g), of degrees 1, 2,4, 8, 16 over Q. The field F; of degree 8 is the real 
subfield generated by 7 = 2 cos 27/17, as in Proposition (8.4). Since each exten- 
sion in this chain has degree 2, F; can be reached by a succession of three square 
root adjunctions. This proves that 2 cos 27/17, and hence the regular 17-gon, can 
be constructed by ruler and compass [Chapter 13 (4.9)]. 

The other field extension which we will describe for all primes is the one of de- 
gree 2 over @. The Main Theorem of Galois theory tells us that there is a unique 
intermediate field L of Q of degree 2, corresponding to the subgroup H of G of or- 
der $(p — 1). If o generates G, then H is generated by a”. 


(8.5) Theorem. Let p be an odd prime, and let L be the unique quadratic exten- 
sion of @ contained in the cyclotomic field Q(Z,). Then 


L = Q(V+p), 


where the sign is (-1)'?~. 


Proof. We need to select a generator of L whose equation is easy to determine. 
Gauss’s method is to take the sum of half of the powers of ¢, suitably chosen. 

There is another choice of generator for L which 1s a little simpler to work 
with. Let D be the discriminant of the polynomial 


(8.6) le 
This discriminant can be computed directly in terms of the roots {1, 2, ¢°,...,¢? '}, 
but it is easier to determine D using the following nice formula: 
(8.7) Lemma. Let f(x) = (x — a)--: (x — an). The discriminant of f is 
D = +f'(a1) ---f' (an) = +[] F'(ai), 
where f’ is the derivative. : 


Proof. By the product rule for differentiation, 


Me) = >) ci) a a1) +(e — ae). 
Therefore 
f (ai) = (ai — a1) ++ (ai — ai-1)(ai — ai+1) +++ (Qi — en). 


This is the product of the differences (a; — aj), with the given i and with j # i. 
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Thus 
IL f'(@) = [] i - aj) = +d. 5 
i i#j 
We apply this lemma to our polynomial x”? — 1. Its derivative is px?~', so the 
discriminant is 


Peele) ee, 


where the exponent N is some integer. To determine ¢”, we note that D is a rational 
number, because the coefficients of x? — 1 are rational. The only power of ¢ which 
is rational is 1. Therefore ¢” = 1 and 


(8.8) D= tp’. 


The square root of this discriminant is 6 = V+p?. It is in the field Q(Z). 
Since p is odd and since square factors can be pulled out of a square root, 


(8.9) Q(6) = QV =p). 


Therefore this field is a quadratic subfield of Q(¢), and since L is the only quadratic 
subfield, it is L. We leave the determination of the sign as an exercise. o 


The following theorem, first stated by Kronecker, is one of the most beautiful 
theorems of algebraic number theory. Unfortunately, it would take too long to prove 
it here. 


(8.10) Theorem. Every Galois extension K of Q whose Galois group is abelian is 
contained in one of the cyclotomic fields Q(€,). 0 


9, QUINTIC EQUATIONS 


The main motivation behind Galois’ work was the problem of solving fifth-degree 
equations. We are going to study his solution in this section. A short time earlier, 
Abel had shown that the quintic 


(9.1) x° + agx* + agx? + ax? + a, + a 


with variable coefficients a, could not be solved in terms of radicals, but it remained 
to find an explicit polynomial with rational coefficients which couldn’t be solved. 
Anyhow, because the problem was over 200 years old, interest in it continued. In 
the meantime, Galois’ ideas have turned out to be much more important than the 
question which motivated them. 

An expression in terms of radicals may become very complicated, and I don’t 
know a good notation for a general one. However, it is easy to give a precise recur- 
sive definition. Let F be an arbitrary subfield of the complex numbers. We say that a 
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complex number a is expressible by radicals over F if there is a tower of subfields 
P= Hee fF) GC... C Frof C sweh that 


(92) 


Qa GF eand 
(ii) for every j = 1,..., 7, Fj is generated over Fj-, by a radical B;. In other words, 
F; = F;--,(B;), and for some integer nj, Bj") € Fj-1. 


This definition is formally similar to the description [Chapter 13 (4.9)] of the real 
numbers which can be constructed by ruler and compass. In that description, only 
square roots of positive real numbers are allowed. 


(9.3) Proposition. Let a be a root of a polynomial f(x) © F[x] of degree < 4. 
Then a is expressible by radicals over F. 


Proof. For quadratic polynomials, this is the quadratic formula. For cubics, 
Cardano’s formula gives the solution. Suppose that f(x) is quartic. If f is reducible, 
then a@ is a root of a polynomial of lower degree, and the problem is solved. If not, 
then f has distinct roots in a splitting field K, so its discriminant D is not zero. Let 
g(x) be the resolvent cubic of f. We proceed by adjoining the square root 6 of D, 
obtaining a field F, (possibly equal to F). Next, we use Cardano’s formula to solve 
the resolvent cubic. This will require a square root extension F> followed by a cube 
root extension F;. At this point, Table (6.14) shows that the Galois group of K/Fy is 
a subgroup of the Klein four group. Therefore K can be reached by a sequence of at 
most two more square root extensions F; C Fy C Fs = K.u 


The nth roots of unity Z, = e°7’”" are allowable in an expression by radicals. 
Also, ifn = rs, then Vb = V Vb. So at the cost of adding more steps to the chain 
of fields, we may assume that all the roots are pth roots, for various prime inte- 
gers p. 

Note that there is a great deal of ambiguity in an expression by radicals, be- 
cause there are n choices for each Vb. The notation ( 3 + V2)!* may stand for 
any one of 20 complex numbers, so the tower of fields Q C Q(V2) Cc 
Q((-3 + V2)'*) is not uniquely defined. This ambiguity is inherent in the nota- 
tion. Since the notation is cumbersome anyhow, we won't bother trying to make it 
more precise. We won't use it very much. 


(9.4) Proposition. Let f(x) be an irreducible polynomial over a field F. If one root 
of fin K can be expressed by radicals. so can any other root. 


Proof. Suppose that one root @ can be expressed by radicals, say using the 
tower F = Fy C ... C F,. Choose a field L which contains F, and which is a split- 
ting field of some polynomial of the form f(2)g (4) over F. Then L is also the split 
ting field of fg over F(a). Let a’ be a root of F in another field K', and let L' be a 
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splitting field of fg over F(a’). Then we can extend the isomorphism 
F(a)—— F(a’) to an isomorphism g:L——>L' (5.2). The tower of fields 
F = ¢(Fo) C ... C @(F,) shows that a’ is expressible by radicals. 5 


(9.5) Proposition. Let a be a complex number which can be expressed by radicals 
over F. Then a tower of fields F = Fo C ... C F, = K can be found so that the 
conditions (i) and (ii) of (9.2) hold and, in addition, 


(iii) for each j, F; is a Galois extension of F;-, and the Galois group G(F,/Fj-1) is a 
cyclic group. 


Proof. Consider the tower given in the definition (9.2), in which F, = 
F(B:,..., Br). AS we have remarked, we may assume that B/’ € Fj-, for some 
prime integer pj. Let ¢p, = e°™/Pi be the p;-th root of 1. We form a new chain of 
fields by adjoining the elements (¢),...., ép,i Bi,-.-, Br) in that order. Theorem (7.4) 
and Proposition (8.2) show that each of these extensions is Galois, with cyclic Galois 
group. Some of the extensions in this tower may be trivial because of redundancy. If 
so, we shorten the chain. Since the last field F({gpjt {8)}) in this chain contains F,, it 
contains @. o 


Let us consider the Galois group of a product of polynomials f(x) g(x) over F. 
Let K’ be a splitting field of fg. Then K’ contains a splitting field K of f, because f 
factors into linear factors in K'. Similarly, K’ contains a splitting field F’ of g. So 
we have a diagram of fields 


K' 
06) CAD) 
9.6 K F' 
~) 
FY 
(9.7) Proposition. With the above notation, let G = G(K/F), H = G(F'/F), 
and § = G(K'/F). 


(a) G and H are quotients of %. 
(b) % is isomorphic to a subgroup of the product group G X H. 


Proof. The first assertion follows from the fact that K and F’' are intermediate 
fields which are Galois extensions of F (5.7b). Let us denote the canonical homo- 
morphisms 6——> G, 6——>H by subscripts: o~~> of and a@~~» a,. Then oy de- 
scribes the way that o operates on the roots of f, and a, describes the way it operates 
on the roots of g. We map % to G X H by a» (o7, Gg). If ay and a, are both the 
identity, then o operates trivially on the roots of fg. and hence o@ = |. This shows 
that the map S4——> G X H is injective and that G is isomorphic to a subgroup of 
GX H. 
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(9.8) Proposition. Let f be a polynomial over F whose Galois group G is a simple 
nonabelian group. Let F’ be a Galois extension of F, with abelian Galois group. Let 
K’ be a splitting field of f over F’. Then the Galois group G(K '/F') is isomorphic 
to G. 


This proposition is a key point. It tells us that if the Galois group of f is a simple 
nonabelian group, then we will not make any progress toward solving for its roots if 
we replace F by an abelian extension F'’. 


Proof of Proposition (9.8). We first reduce ourselves to the case that [F’ : F] 
is a prime number. To do this, we suppose that the lemma has been proved in that 
case, and we choose a cyclic quotient group H of G(F'/F) of prime order. Such a 
quotient exists because G(F'/F) is abelian. This quotient determines an intermedi- 
ate field F, C F’ which is a Galois extension of F, and such that G(F,/F) = H 
(5.7). Let K, be the splitting field of f over Fi. Then since [F, : F] is a prime, 
G(K\/F\) = G. So we may replace F by F, and K by K,. Induction on [F' : F] will 
complete the proof. 

So we may assume that [F' : F] = p and that H = G(F'/F) is acyclic group 
of order p. The splitting field K’ will contain a splitting field of f over F, call it K. 
We are then in the situation of Proposition (9.7). So the Galois group % of K’ over 
F is a subgroup of G x H, and it maps surjectively to G. It follows that |G| divides 
||, and || divides |G x H| = p|G|. If |G| = |G], then counting degrees shows 
that K’ = K. In this case, K contains the Galois extension F'’, and hence H is a quo- 
tient of G (5.7b). Since G is a nonabelian simple group, this is impossible. The only 
remaining possibility is that G = G x H. Applying the Main Theorem to the chain 
of fields F C F’ C K’, we conclude that G(K'/F’) = G, as required. o 


(9.9) Theorem. The roots of a quintic polynomial f(x) whose Galois group is Ss 
or As can not be expressed by radicals over F. 


Proof. Let K be a splitting field of f. If G = S;, then the discriminant of f is 
not a square in F. In that case, we replace F by F(5), where 6 is a square root of the 
discriminant in K. The Galois group G(K/F(6)) is As. Obviously, it is enough to 
show that the roots of f can not be expressed by radicals over the larger field F(6). 
This reduces the case that the group is S; to the case that it is As. 

Suppose that the Galois group of f is As but that some root a@ of f is expressible 
by radicals over F. Say that a © F,, where F, is the end of a chain of field exten- 
sions F = Fy C ... C F,, each extension in the chain being Galois, with a cyclic 
Galois group. Now since the Galois group of f over F is a simple group, Proposition 
(9.8) shows inductively that for each i, the Galois group of f over F; is As too. On the 
other hand, since it has a root @ in F,, the polynomial f will not remain irreducible 
over that field. Therefore the Galois group of f over F; will not operate transitively on 
the five roots of f in a splitting field. In particular, the Galois group can not be the 
alternating group. This is a contradiction, which shows that the roots of f are not ex- 


pressible by radicals. o 
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We will now exhibit a specific quintic polynomial over Q whose Galcis group 
is S;. The facts that 5 is prime and that the Galois group G acts transitively on the 
roots {@,... as} limit the possible Galois groups greatly. For, since the action is tran- 
sitive, |G| is divisible by 5. Thus G contains an element of order 5. The only ele- 
ments of order 5 in Ss are cyclic permutations such as o = (12345). 


(9.10) Lemma. If G contains a transposition, then G = Ss. 


Proof. By transposition 7 we mean, as always, a permutation which inter- 
changes two indices. We may assume that G contains the cyclic permutation o 
above. Renumbering if necessary, we may assume that 7 acts as (li). We replace a 
by a '| and renumber again, to reduce to the case that 7 is the transposition (12). It 
remains only to verify that a and 7 generate S;, which is left as an exercise. c 


(9.11) Corollary. Suppose that the irreducible polynomial (9.1) has roots 
{a,,...,as}, and let K be its splitting field. If F(a1,a2,a3) < K, then G(K/F) is the 
symmetric group Ss. 


For let F’ = F(a:,a@2,a@3). The only nontrivial permutation fixing a,,a@2,a is the 
transposition (45). If F’ # K, this permutation must be in G(K/F'). Thus G(K/F) 
contains a transposition. o 


(9.12) Corollary. Let f(x) be an irreducible quintic polynomial over Q@ with ex- 
actly three real roots. Then its Galois group is the symmetric group, and hence its 
roots can not be expressed by radicals. 


For, call the real roots a,,a2,a3. Then Q(a,,a2,a3) C R, but since a4, a5 are not 
real, K is not a subfield of R. So we can apply Corollary (9.11) to conclude that the 
Galois group of fis S;. By Theorem (9.9), the roots of fcan not be expressed by rad- 
icals. o 


(9.13) Example. The polynomial x° — 16x = x(x? — 4)(x? + 4) has three real 
roots, but of course it is not irreducible. But we can add a small constant without 
changing the number of real roots. This is seen by looking at the graph of the poly- 
nomial. For instance, 

x> — 16x + 2 


still has three real roots, and it is irreducible by the Eisenstein Criterion [Chapter 10 
(4.9)]. So its roots can not be expressed by radicals over Q. 


Il parait apres cela qu'il n'y a aucun fruit a tirer 
de la solution que nous proposons. 


Evariste Galois 
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EXERCISES 


c 


awn = 


10. 


16. 


17. 


18. 


1s: 


The Main Theorem of Galois Theory 


. Determine the irreducible polynomial for i + V2 over Q. 

. Prove that the set (1,7, V2,i</2) is a basis for Q(i, V2) over Q. 

. Determine the intermediate fields between @ and Q(V2, V3). 

. Determine the intermediate fields of an arbitrary biquadratic extension without appealing 


to the Main Theorem. 


- Prove that the automorphism a(v2) sending V2 to -V2 is discontinuous. 
. Determine the degree of the splitting field of the following polynomials over Q. 


@ix* — 1 SEC Se n@ x* + 1 


. Let @ denote the positive real fourth root of 2. Factor the polynomial x* — 2 into irre- 


ducible factors over each of the fields Q, Q(V2), Q(V2, i), Q(a), Q(a, i). 


. Kemgne?”, 


(a) Prove that K = Q(€) is a splitting field for the polynomial x° — 1 over Q, and de- 
termine the degree [K : Q]. 
(b) Without using Theorem (1.11), prove that K is a Galois extension of Q, and deter- 


mine its Galois group. 


. Let K be a quadratic extension of the form F(a), where a? = a € F. Determine all ele- 


ments of K whose squares are in F. 
Let K = Q(V2, V3, V5). Determine [K : Q], prove that K is a Galois extension of Q, 
and determine its Galois group. 


. Let K be the splitting field over @Q of the polynomial f(x) = 


(x? — 2x — 1)(x? — 2x — 7). Determine G(K/Q), and determine all intermediate 
fields explicitly. 


. Determine all automorphisms of the field Q(W/2). 
. Let K/F be a finite extension. Prove that the Galois group G(K/F) is a finite group. 
. Determine all the quadratic number fields Q[Vd] which contain a primitive pth root of 


unity, for some prime p # 2. 


. Prove that every Galois extension K/F whose Galois group is the Klein four group is 


biquadratic. 

Prove or disprove: Let f(x) be an irreducible cubic polynomial in Q[x] with one real root 
a. The other roots form a complex conjugate pair B, B, so the field L = Q() has an au- 
tomorphism o which interchanges B, B. 

Let K be a Galois extension of a field F such that G(K/F) ~ C2 X C,2. How many inter- 
mediate fields ZL are there such that (a)(L:F]=4, (b) (L:FJ]=9, 
(c) G(K/L) ~ C4? 

Let f(x) = x4 + bx? + c © F[x], and let K be the splitting field of f. Prove that 
G(K/F) is contained in a dihedral group D,. 

Let F = F>(u) be the rational function field over the field of two elements. Prove that the 
polynomial x? — u is irreducible in F[x] and that it has two equal roots in a splitting 


field. 
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20. 


Let F be a field of characteristic 2, and let K be an extension of F of degree 2. 

(a) Prove that K has the form F(a), where a is the root of an irreducible polynomial 
over F of the form x? + x + a, and that the other root of this equation is a + 1. 

(b) Is it true that there is an automorphism of K sending a~~»a + 1? 


Cubic Equations 


. Prove that the discriminant of a real cubic is positive if all the roots are real, and nega- 


tive if not. 


. Determine the Galois groups of the following polynomials. 


(a) x3 -—2 (b) x2 + 27x -4 (©) x? t+x4+1 (ad) x2 4+ 3x4 14 
(eex’ — 3a + 1 (8) x? 2 + 7 Bra 
(hice x ht 


. Let f be an irreducible cubic polynomial over F, and let 6 be the square root of the dis- 


criminant of f. Prove that f remains irreducible over the field F(6). 


. Let a@ be a complex root of the polynomial x* + x + 1 over Q, and let K be a splitting 


field of this polynomial over Q. 
(a) Is V -3 in the field Q(a@)? Is it in K? 
(b) Prove that the field Q(a) has no automorphism except the identity. 


. Prove Proposition (2.16) directly for a cubic of the form (2.3), by determining the for- 


mula which expresses a in terms of a, ,5, p,q explicitly. 


. Let f © Qf[x] be an irreducible cubic polynomial which has exactly one real root, and let 


K be its splitting field over Q. Prove that [K : Q] = 6. 


. When does the polynomial x* + px + q have a multiple root? 
. Determine the coefficients p,q which are obtained from the general cubic (2.1) by the 


substitution (2.2). 


. Prove that the discriminant of the cubic x* + px + q is —-4p* — 27q?. 
Symmetric Functions 
. Derive the expression (3.10) for the discriminant of a cubic by the method of undeter- 


mined coefficients. 


. Let f(u) be a symmetric polynomial of degree d in w,...,Un, and let 


f(ui,...,Un—1) = f(ur,...,Un—1, 0). Say that f°(u) = g(s°), where s° are the elemen- 
tary symmetric functions in u),...,4n—1. Prove that if n > d, then f(u) = g(s). 


. Compute the discriminant of a quintic polynomial of the form x° + ax + b. 
. With each of the following polynomials, determine whether or not it is a symmetric func- 


tion, and if so, write it in terms of the elementary symmetric functions. 
(a) u)2u2 + u2u, (n = 2) 

(Db) u,2u2 + u2*u3 + us’, (n = 3) 

(c) (uy + u2)(u2 + u3)(u) + us) (n = 3) 

(d) upuz + u2°u3 + usu) — Uyu2? — U2U;? — uu, (n = 3) 

(Clear tage + 4 a, 


. Find two natural bases for the ring of symmetric functions, as free module over the ring 


R. 


Chapter 14 Exercises 577 


*6. 


10. 
ae 


4, 


iP 


2. 


4. 


“5. 


Define the polynomials w,,...,w, in variables u),..., Un by we = uk + oo + ugk. 
(a) Prove Newton's. identities: we — s\we—, + S.We—-2 — *** © Se—-j Wy * ksp = 0. 
(b) Do w,,..., wn generate the ring of symmetric functions? 


- Let f(x) = x° + ax? + ayx + ao. Prove that the substitution x = x, — (a,/3) does not 


change the discriminant of a cubic polynomial. 


- Prove that [F (u) : F(s)] = n! by induction, directly from the definitions. 
- Let u;,..., %, be variables and let D, denote the discriminant. Define 


i, j#k 
(a) Prove that D2 is a symmetric polynomial, and compute its expression in terms of the 
elementary symmetric polynomials for the cases n = 2,3. 
(b) Let a),..., an be elements of a field of characteristic zero. Prove that D,(a,,...,@n) = 
DHa\,.... dn) = O if and only if the number of distinct elements in the set {a,,..., dn} 
Ie? 2 


Compute the discriminants of the polynomials given in Section 2, exercise 2. 
(Vandermonde determinant) (a) Prove that the determinant of the matrix 


Lie i? 2 = = a?! 


] U2 U2 


is a constant multiple of 5(u). 
(b) Determine the constant. 


Primitive Elements 


Let G be a group of automorphisms of a field K. Prove that the fixed elements K° form 

a subfield of K. 

Leta = V2, 6 =4(-1 + V-3), B =a. 

(a) Prove that for all c € Q, y = @ + cf is the root of a sixth-degree polynomial of 
the form x® + ax? + b. 

(b) Prove that the irreducible polynomial for a + B is cubic. 

(c) Prove that a — B has degree 6 over Q. 

For each of the following sets of automorphisms of the field of rational functions C(y), 

determine the group of automorphisms which they generate, and determine the fixed 

field explicitly. 

(a) o(y)=y"! (b) oy) = iy © aly) = -y, T(y) =" @ oly) = &, Ty) = 
y', where {= €?/3 (e) o(y) =i, TW) =! 

(a) Show that the automorphisms o(y) = (y + i)/(y — id), Ty) = i — D/O + I) 
of C(y) generate a group isomorphic to the alternating group A,. 

*(b) Determine the fixed field of this group. 
Let F be a finite field, and let f(x) be a nonconstant polynomial whose derivative is the 
zero polynomial. Prove that f is not irreducible over F. 
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5. Proof of the Main Theorem 


1. Let K = Q(a), where a is a root of the polynomial x* + 2x + 1, and let g(x) = 
x? + x + 1. Does g(x) have a root in K? 

2. Let f € F[x] be a polynomial of degree n, and let K be a splitting field for f. Prove that 
[K : F] divides n!. 

3. Let G be a finite group. Prove that there exists a field F and a Galois extension K of F 
whose Galois group is G. 

4. Assume it known that 7 and e are transcendental numbers. Let K be the splitting field of 
the polynomial x* + ax + 6 over the field F = Q(z). 
(a) Prove that [K : F] = 6. 
(b) Prove that K is isomorphic to the splitting field of x? + ex + 6 over Q(e). 

5. Prove the isomorphism F[x]/(f(x)) ~ F[x]/(f(x)) used in the proof of Lemma (5.1) 
formally, using the universal property of the quotient construction. 

6. Prove Corollary (5.5). 

7. Let f(x) be an irreducible cubic polynomial over Q whose Galois group is $;. Determine 
the possible Galois groups of the polynomial (x? — 1) - f(x). 


a 


Ke 


Ca) 
8. Consider the diagram of fields K F' 
QTE, 
ie 
in which K is a Galois extension of F, and K’ is generated over F by K and F’’. Prove 
that K’ is a Galois extension of F' and that its Galois group is isomorphic to a subgroup 
of G(K/F). 
9. LetK DL 2 F be fields. Prove or disprove: 
(a) If K/F is Galois, then K/L is Galois. 
(b) If K/F is Galois, then L/F is Galois. 
(c) If L/F and K/L are Galois, then K/F is Galois. 

10. Let K be a splitting field of an irreducible cubic polynomial f(x) over a field F whose 
Galois group is S;. Determine the group G(F(a)/F) of automorphisms of the extension 
F(e). 

11. Let K/F be a Galois extension whose Galois group is the symmetric group S$;. Is it true 
that K is the splitting field of an irreducible cubic polynomial over F? 

12. Let K/F be a field extension of characteristic p # 0, and Jet @ be a root in K of an irre- 
ducible polynomial f(x) = x? — x — a over F. 

(a) Prove that a + 1 is also a root of f(x). 
(b) Prove that the Galois group of f over F is cyclic of order p. 


6. Quartic Equations 


1. Compute the discriminant of the quartic polynomial x* + 1, and determine its Galois 
group over Q. 

2. Let K be the splitting field of an irreducible quartic polynomial f(x) over F, and let the 
roots of f(x) in K be a, a@2,@3,a@4. Also assume that the resolvent cubic g(x) has a root, 
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*14, 


15. 


16. 


17. 


say B, = aa, + asa4. Express the root a, explicitly in terms of a succession of square 
roots. 


. What can you say about the Galois group of an irreducible quartic polynomial over Q 


which has exactly two real roots? 


. Suppose that a real quartic polynomial has a positive discriminant. What can you say 


about the number of real roots? 


. Let K be the splitting field of a reducible quartic polynomial with distinct roots over a 


field F. What are the possible Galois groups of K/F? 


. What are the possible Galois groups over Q of an irreducible quartic polynomial f(x) 


whose discriminant is negative? 


. Let g be the resolvent cubic of an irreducible quartic polynomial f © F(x]. Determine 


the possible Galois groups of g over F, and in each case, say what you can about the 
Galois group of f. 


. Let K be the splitting field of a polynomial f € F[x] with distinct roots a1,...,a@,, and 


let G = G(K/F). Then G may be regarded as a subgroup of the symmetric group Sp. 
Prove that a change of numbering of the roots changes G to a conjugate subgroup. 


. Let a1,...,a@4 be the roots of a quartic polynomial. Discuss the symmetry of the elements 


@;Q@2 and a, + a along the lines of the discussion in the text. 


. Find a quartic polynomial over Q whose Galois group is (a) S4, (b) Da, (c) Ca. 
. Let @ be the real root of a quartic polynomial f over @. Assume that the resolvent cubic 


is irreducible. Prove that a can’t be constructed by ruler and compass. 


. Determine the Galois groups of the following polynomials over Q. 


(a) x* +4024 2. (b)ex* + 2x? +4 (©) x*4+ 4? -5 @ xt‘ -2 (© x*+2 
6) aber bn Hees ta tat tx t+ 1 OW) ate? + 4 


. Compute the discriminant of the quartic polynomial x* + ax + b, using the formula in 


Lemma (8.7). 

Let f be an irreducible quartic polynomial over F of the form x* + rx + s, and let 

1, 2, 3, @, be the roots of f in a splitting field K. Let 7 = aaz. 

(a) Prove that 7 is the root of a sextic polynomial h(x) with coefficients in F. 

(b) Assume that the six products a;a; are distinct. Prove that h(x) is irreducible, or else 
it has an irreducible quadratic factor. 

(c) Describe the possibilities for the Galois group G = G(K/F) in the following three 
cases: h is irreducible, h is a product of an irreducible quadratic and an irreducible 
quartic, and h is the product of three irreducible quadratics. 

(d) Describe the situation when some of the products are equal. 

Let K be the splitting field of the polynomial x* — 3 over Q. 

(a) Prove that [K : Q] = 8 and that K is generated by / and a single root a of the poly- 
nomial. 

(b) Prove that the Galois group of K/Q is dihedral, and describe the operation of the el- 
ements of G on the generators of K explicitly. 

Let K be the splitting field over Q of the polynomial x* — 2x* — 1. Determine the 

Galois group G of K/Q, find all intermediate fields, and match them up with the sub- 

groups of G. 

Let f(x) be a quartic polynomial. Prove that the discriminants of f and of its resolvent 


cubic are equal. 
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18. Prove the irreducibility of the polynomial (6.17) and of its resolvent cubic. 

19. Let K be the splitting field of the reducible polynomial (x — 1)*(x? + 1) over Q. Prove 
that 6 € Q, but that G(K/Q) is not contained in the alternating group. 

20. Let f(x) be a quartic polynomia! with distinct roots, whose resolvent cubic g(x) splits 
completely in the field F. What are the possible Galois groups of f(x)? 


21. Let Z = e 27/3 be the cube root of I, leta = V at+bV2, and let K be the splitting field 
of the irreducible polynomial for a over Q(¢). Determine the possible Galois groups of K 
over Q(Z). 

22. Let # be a subgroup of the symmetric group S,. Given any monomial m, we can’ form 
the polynomial p(u) = Xeex om. Show that if m = uyu2?us* ++ Un—1"', then p(u) is 
partially symmetric for #; that is, it is fixed by the permutations in #€ but not by any 
other permutations. 

23. Let p(u) be the polynomial formed as in the last problem, with # = A,. Then the orbit 
of p(u) contains two elements, say p(u),q(u). Prove that p(u) — q(u) = +8(u). 

24. Determine the possible Galois groups of a reducible quartic equation of the form 
x* + bx? + c, assuming that the quadratic y* + by + c is irreducible. 

25. Compute the discriminant of the polynomial x* + rx + s by evaluating the discrimi- 
nants of x* — x and x* — 1. 

26. Use the substitution x~—~y' to determine the discriminant of the polynomial 
xe eam 4B: 

27. Determine the resolvent cubic of the polynomials (a) x* + rx + sand (b) x* + a,x? + 
Q2x* + a3x + ay. 

28. Let f(x) = x* — 2rx? + (r? — sv), with r,s,o © F. Assume that f is irreducible, and 
let G denote its Galois group. Let L = F(Vv,6), where 5? = D. Prove each statement. 
(a) L(a) = K 
(b) If [L : F] = 4, then G = Da. 

(c) If [L : F] = 2 and 6 € F, then G = Cy. 

29. Determine the Galois groups of the last two examples of (6.5). 

30. Determine the action of the Galois group G on the roots {a,a', -a, -a‘} (6.7) explic- 
itly, assuming that (a) G = C4, (b) G= Dg. 

31. Determine whether or not the following nested radicals can be written in terms of 
unnested ones, and if so, find an expression. 


(a) V2+V11 (b) V64VI1 (ce) V114+6V2 @ V114+V6 


*32. Let K be the splitting field of a quartic polynomial f(x) over Q, whose Galois group is 
Dz, and let a be a real root of f(x) in K. Decide whether or not @ can be constructed by 
ruler and compass if (a) all four roots of f are real, (b) f has two real roots. 


33. Can the roots of the polynomial x* + x — 5 be constructed by ruler and compass? 
7. Kummer Extensions 


1. Suppose that a Galois extension K/F has the form K = F(a) and that for some integer n, 
a” € F. What can you say about the Galois group of K/F? 
*2. Let a be an element of a field F, and let p be a prime. Suppose that the polynomial 
x? — a is reducible in F(x]. Prove that it has a root in f. 
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3 


4. 


Let F be a subfield of C which contains i, and let K be a Galois extension of F whose 
group is C,. Is it true that K has the form F(a), where a* € F? 


Let f(x) = x° + px + q be an irreducible polynomial over a field F, with roots 
@,02,03. Let B = a, + fa, + £?a3, where £ = e?7/3, Show that B is an eigenvector 
of o for the cyclic permutation of the roots unless 8 = 0, and compute 8? explicitly in 
terms of p,q,6, . 


. Let K be a splitting field of an irreducible polynomial f(x) € F[x] of degree p whose 


Galois group is a cyclic group of order p generated by o”, and suppose that F contains the 
pth root of unity ¢ = g,. Let a@,a,...,ap be the roots of f in K. Show that 
B= a + f’a + az + -- + [ay is an eigenvector of a, with eigenvalue £”; 
unless it is zero. 


. Let f(x) = x* + px + q be an irreducible polynomial over a subfield F of the complex 


numbers, with complex roots a = a;,a@2,a3. Let K = F(a). 

(a) Express (6a? + 2p)"' explicitly, as a polynomial of degree 2 in a. 

(b) Assume tnat 6 = VD is in F, so that K contains the other roots of f. Express a2 as a 
polynomial in a = a, and 6. 

(c) Prove that (1,a;,@2) is a basis of K, as F-vector space. 

(d) Let o be the automorphism of K which permutes the three roots cyclically. Write the 
matrix of g with respect to the above basis, and find its eigenvalues and eigenvec- 
tors. 

(e) Let v be an eigenvector with eigenvalue £ = e?7/?. Prove that if V-3 € F then 
v? & F. Compute v? explicitly, in terms of p,q, 6, V-3. 

(f) Dropping the assumptions that 6 and V3 are in F, express v in terms of radicals. 

(g) Without calculation, determine the element v' which is obtained from v by inter- 
changing the roles of a, a2. 

(h) Express the root a in terms of radicals. 


8 Cyclotemic Extensions 


ANG 


Determine the degree of £; over the field Q(é;). 


2. Let ¢ = gj, and let K = Q(Q). Determine the intermediate field of degree 3 over Q ex- 


plicitly. 


. Let ¢ = %7. Determine the succession of square roots which generate the field 


Q(é + £'%) explicitly. 


. Let £ = &. Determine the degree of the following elements over Q. 


(Gate wD) 2 1 wee) (°° + o> + 

Let £ = ¢3. Determine the degree of the following elements over Q. 

Guest” (beet ©) C427 + f° dd) er te 

Lt Otte? HLH Lt +l? MLL tS + e+ orse 
Let £ = Zi. 

(a) — a=f£4+ 040+ 05+ © generates a field of degree 2 over Q, and 


find its equation. 
(b) Find an element which generates a subfield of degree 5 over Q, and find its equation. 


. Prove that every quadratic extension of @ is contained in a cyclotomic extension. 
. Let K = Q(,). 


(a) Prove that K is a Galois extension of Q. 
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(b) Define an injective homomorphism v: G(K/Q)—— U to the group U of units in the 
ring Z/(n). 

(c) Prove that this homormophism is bijective when n = 6,8, 12. (Actually, this map is 
always bijective.) 

*9, Let p be a prime, and let a be a rational number which is not a pth power. Let K be a 

splitting field of the polynomial x? — a over Q. 

(a) Prove that K is generated over @ by a pth root a of a and a primitive pth root ¢ of 
unity. 

(b) Prove that [K : Q] = p(p — 1). 

(c) Prove that the Galois groups of K/@ is isomorphic to the group of invertible 2 x 2 


matrices with entries in F, of the form ij ‘h and describe the actions of the ele- 


ments |“ | and [: 4 on the generators explicitly. 


10. Determine the Galois group of the polynomials x* — 1, x"? — 1, x° — 1. 
11. (a) Characterize the primes p such that the regular p-gon can be constructed by ruler and 
compass. 
(b) Extend the characterization to the case of an n-gon, where n is not necessarily prime. 
*12. Let v be a primitive element modulo a prime p, and let d be a divisor of p — 1. Show 
how to determine a sum of powers of £ = , which generates the subfield L of Q(¢) of 
degree d over Q, using the list of roots of unity {f, 2”, £”,...,0”" }. 


9. Quintic Equations 


1. Determine the transitive subgroups of Ss. 

2. Let G be the Galois group of an irreducible quintic polynomial. Show that if G contains 
an element of order 3, then G = Ss or As. 

*3. Let p be a prime integer, and let G be a p-group. Let H be a proper normal subgroup of 
Ge : 
(a) Prove that the normalizer N (A) of H is strictly larger than H. 
(b) Prove that H is contained in a subgroup of index p and that that subgroup is normal 
in G. 
(c) Let K be a Galois extension of @ whose degree is a power of 2, and such that 
K CR. Prove that the elements of K can be constructed by ruler and compass. 

4. LetK DL D F be a tower of field extensions of degree 2. Show that K can be gener- 

ated over F by the root of an irreducible quartic polynomial of the form x* + bx* + c. 
*5. Cardano’s Formula has a peculiar feature: Suppose that the coefficients p,q of the cubic 

are real numbers. A real cubic always has at least one real root. However, the square 
root appearing in the formula (2.6) will be imaginary if (¢/2)? + (p/3)* < 0. In that 
case, the real root is displayed in terms of an auxiliary complex number u. This was con- 
sidered to be an improper solution in Cardano’s time. Let f(x) be an irreducible cubic 
over a subfield F of R, which has three real roots. Prove that no root of f is expressible 
by real radicals, that is, that there is no tower F = (Fo C ... C F; as in (9.2), in which 
all the fields are subfields of R. 

6. Let f(x) © F[x] be an irreducible quintic polynomial, and let K be a splitting field for 
f(x) over F. 


Chapter 14 Exercises 583 


ie 


8. 
aoe 


(a) What are the possible Galois groups G(x //’), assuring that the discriminant D is a 
square in F? 
*(b) What are the possible Galois groups if D is not a square in F? 
Determine which real numbers a of degree 4 over Q can be constructed with ruler and 
compass in terms of the Galois group of the corresponding polynomial. 
Is every Galois extension of degree 10 solvable by radicals? 
Find a polynomial of degree 7 over Q whose Galois group is Sy. 


Miscellaneous Problems 


I 
26 


3. 


4. 


eS: 


“8. 


Let K be a Galois extension of F whose Galois group is the symmetric group S;. What 

numbers occur as degrees of elements of K over F? 

Show without computation that the side length of a regular pentagon inscribed in the unit 

circle has degree 2 over Q. 

(a) The nonnegative real numbers are those having a real square root. Use this fact to 
prove that the field R has no automorphism except the identity. 

(b) Prove that C has no continuous automorphisms except for complex conjugation and 
the identity. 

Let K/F be a Galois extension with Galois group G, and let H be a subgroup of G. Prove 

that there exists an element 8 € K whose stabilizer is H. 

(a) Let K be a field of characteristic p. Prove that the Frobenius map ¢ defined by 
(x) = x? is a homomorphism from K to itself. 

(b) Prove that ¢ is an isomorphism if K is a finite field. 

(c) Give an example of an infinite field of characteristic p such that ¢ is not an isomor- 
phism. 

(d) Let K = F,, where g = p’, and let F = F,. Prove that G(K/F) is acyclic group of 
order r, generated by the Frobenius map ¢. 

(e) Prove that the Main Theorem of Galois theory holds for the field extension K/F. 


. Let K be a subfield of C, and let G be its group of automorphisms. We can view G as 


acting on the point set K in the complex plane. The action will probably be discontinu- 

ous, but nevertheless, we can define an action on line segments [a, 8B] whose endpoints 

are in K, by defining g{a,B] = [ ga, 2B]. Then G also acts on polygons whose vertices 

are in K. 

(a) Let K = Q(€). where @ is a primitive fifth root of 1. Find the G-orbit of the regular 
pentagon whose vertices are 1,¢,¢°, f°. 2°. 

(b) Let @ be the side length of the pentagon of (a). Show that a = a° € K, and find the 
irreducible equation for a over Q. Is a € K? 

A polynomial f € F[x,...,%n] is called 3-symmetric if f(Uo1,...,Uon) = f(ur,...,Un) 

for every even permutation o of the indices, and skew-symmetric if f(uo1,...,Uon) = 

(sign a) f(u1,..., 4n) for every permutation o. 

(a) Prove that the square root of the discriminant 6 = [];.., (u: — uj) is skew-symmet- 
ric. 

(b) Prove that every 4-symmetric polynomial has the form f + g6, where f, g are sym- 
metric polynomials. 

Let fix. v) © C[x,¥] be an irreducible polynomial, which we regard as a polynomial 

fly) in y. Assume that f is cubic as a polynomial in y. Its discriminant D, computed 
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wath regard to the variable v. will be a pelwnemial inv Assume that there is a roet as of 


Fes 

D(x) which is not a muluple root. 

(a) Prove that the polwnemial tia oy! inv has one simple root and ene double root. 

(b) Prove that the splitting field K of f(v) over C(x) has degree 6. 

Let A be a subfield of C winch is a Galois extension of G. Prove or disprove: Complex 

cemugation carries A to itself, and theretore it defines an automerphism of A, 

Let A be a@ finite extension of a field F. and let tiv) © Afi]. Prove that there is a 

nonzero polynomial ¢(x) & A[x] such that fiwe(x) € Fx]. 

Let fixt be an erreductble quart poivnennal in Fv). Leta ca: as.ae be its reets in a 

splitting feld A Assume that the reselvent cube has a roet 8 = aa: + ava. in F, but 

that the discriminant 2 is met a square in F. According to the text. the Galois group et 

Ky F is either C, or Dg. 

(a) Determine the subgrour A of the group 8. @f permutations ef the reets a, which sta- 
biizes Gexpneitiv Dent ferget to prove that no permutations other than these vou 
list fix B. 

(b) Let y = a:q@2—a:a, and € = a@,+a@:—a:—a,. Describe the action of H on these ele- 
ments. 

(c) Prove that y* and €* are in F. 

td) Let 3 be the square root or the diserrminant. Preve that if + = 0, then dy is a square 
in F if and only if G = C,. Similarly. prove that if e # 0, then de is a square in F if 
and only if G = C,. 

(e) Prove that y and € can't both be zero. 

Let = Fas.’ be a ratenal function feld im two variables ever the field =. with p ele- 

ments, and let K = F(a.B). where @.B are roots of the polynomials x? — u and 

x? — tv respectively. Prove the following. 

(a) The extension K/F has no primitive element. 

(b) The elements ¥ = 3 - ca. Where ¢ & F. generate infimtely many different tnter- 
mediate fields L. 

Let A be a field with p’ elements. Prove that the Frobenius map defined by oii) = v* is 

2 linear transtermation of A. when A is viewed as a vector space the prime field F = F., 

and determine its eigenvectors and eigenvalues. 


Wie weit diese Methoden reichen werden, muss erst 
die Zukunft zeigen. 


Emmy Noether 


Appendix 


Background Material 


Historically speaking, it is of course quite untrue 

that mathematics is free from contradiction; 

non-contradiction appears as a goal to be achieved, 

not as a God-given quality that has been granted us once for all. 


Nicolas Bourbaki 


I. SET THEORY 


This section reviews some conventions about set theory which are used in this book, 
as well as some facts which will be referred to occasionally. 
First, a remark about definitions: Any definition of a word or a phrase will 


have roughly the form 
(iT) xxx if @#&$% , 


where xxx is the word which is being defined and @#&$% is its defining property. 
For example, the sentence “An integer n is positive if n > 0” defines the notion ofa 
positive integer. In a definition, the word if means if and only if. So in the definition 
of the positive integers, all integers which don’t satisfy the requirement n > 0 are 
ruled out. 

The notation 


(1.2) {s € S| @#&S$%} 


stands for the subset of S consisting of all elements s such that @#&$% is true. 
Thus if Z denotes the set of all integers, then N = {n € Z|n > O} describes N as 
the set of positive integers or natural numbers. 

Elements a),...,@n of a set are said to be distinct if no two of them are equal. 

A map ¢ from a set S to a set T is any function whose domain of definition is S 
and whose range is T. The words function and map are used synonymously. We re- 
quire that a function be single-valued. This means that every element s © S must 
have a uniquely determined image p(s) € T. The range T of ¢ is not required to be 
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the set of values of the function. By definition of a function, every image element 
¢(s) is contained in 7, but we allow the possibility that some elements t € T are 
not taken on by the function at all. We also take the domain and range of a function 
as part of its definition. If we restrict the domain to a subset, or if we extend the 
range, then the function obtained is considered to be different. 

The domain and range of a map may also be described by the use of an arrow. 
Thus the notation g: § — T tells us that g is a map from S to T. The statement that 
t = o(s) may be described by a wiggly arrow: s~»¢t means that the element s € S is 
sent tot © T by the map under consideration. For example, the map g: Z — Z such 
that (n) = 2n + 1 is described by n~»2n + 1. 

The image of the map ¢ is the subset of T of elements which have the forfn 
y(s) for some s € S. It will often be denoted by im ¢, or by ¢(S): 


(1.3) | im g = {t € T|t = ¢—(s) for some s € S}. 


In case im ¢ is the whole range 7, the map is said to be surjective. Thus ¢ is surjec- 
tive if every t € T has the form ¢y(s) for some s € S. 

The map ¢ is called injective if distinct elements s,, 52 of S have distinct im- 
ages, that is, if s; # s2 implies that p(s:) # g(s2). A map which is both injective 
and surjective is called a bijective map. A permutation of a set S is a bijective map 
from S to itself. 

Let gy: S > T and w: T — S be two maps. Then wh is called an inverse function 
of ¢ if both of the composed maps ¢ ° w: T — T and we g: S — S are the identity 
maps, that is, if p(w(t)) = ¢ for all t © T and w(g(s)) = s for all s € S. The in- 
verse function is often denoted by ¢™'. 


(1.4) Proposition. A map ¢: S — T has an inverse function if and only if it is bi- 
jective. 


Proof. Assume that g has an inverse function yw, and let us show that ¢ is both 
surjective and injective. Let t be any element of 7, and let s = w(t). Then y(s) = 
y(wW(t)) = t. So t is in the image of gy. This shows that ¢ is surjective. Next, let 
51, 52 be distinct elements of S, and let t; = ¢(s;). Then W(t;) = s;. So t), t2 have dis- 
tinct images in S, which shows that they are distinct. Therefore is injective. Con- 
versely, assume that ¢ is bijective. Then since ¢ ts surjective, every element t € T 
has the form t = y(s) for some s € S. Since ¢ is injective, there can be only one 
such element s. So we define ~ by the following rule: y(t) is the unique element 
s € S such that y(s) = t. This map is the required inverse function. o 


Let y: S — T be a map, and let U be a subset of T. The inverse image of U is 
defined to be the set 


(1.5) g'(U) ={s € S| p(s) € U}. 


This set is defined whether or not ¢ has an inverse function. The notation g~', as 
used here, is symbolic. 

A set is called finite if it contains finitely many elements. If so, the number of 
its elements, sometimes called its cardinality, will be denoted by | S|. We will also 
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call this number the order of S. If S is infinite, we write |S| = 0. The following 
theorem is quite elementary, but it is a very important principle. 


(1.6) Theorem. Let g: S — T be a map between finite sets. 


(a) If ¢ is injective, then |S| < |7]. 

(b) If ¢ is surjective, then | S| = |T|. 

(c) If |S| = |7 |, then ¢ is bijective if and only if it is either injective or surjec- 
tive. c 


The contrapositive of part (a) is often called the pigeonhole principle: If |S| > |T |, 
then ¢ is not injective. For example, if there are 87 socks in 79 drawers, then some 
drawer contains at least two socks. 

An infinite set S is called countable if there is a bijective map g: N — S from 
the set of natural numbers to S. If there is no such map, then S is said to be uncount- 
able. 


(1.7) Proposition. The set R of real numbers is uncountable. 


Proof. This proof is often referred to as Cantor’s diagonal argument. Let 9: 
NV — R be any map. We list the elements of the image of ¢ in the order y(1), y(2), 
y(3),..., and we write each of these real numbers in decimal notation. For example, 
the list might begin as follows: 


g1)=82 35470984534... 
gy(2) = 23930 1384S. 700K)... 
GBi= S9:0s84 05.98 67 5S... 
g(4)=12 87435264444... 
(5) = 00144100349.. 


We will now determine a real number which is not on the list. Consider the real 
number u whose decimal expansion consists of the underlined digits: u = 
.3 2 8 3 4... . We form a new real number by changing each of these digits, say 


[De ee E, 


Notice that v # y(1), because the first digit, 4, of v is not equal to the correspond- 
ing digit, 3, of g(1). Also, v # y(2), because the second digit, 5, of v is not equal 
to the corresponding digit of y(2). Similarly, v # g(n) for all n. This shows that 
is not surjective, which completes the proof, except for one point. 

Some real numbers have two decimal expansions: .99999... is equal to 
1.00000..., for example. This creates a problem with our argument. We have to 
choose v so that infinitely many of its digits are different from 9 and 0. ‘oni easiest 
way is to avoid these digits altogether. o 


¥ 
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At a few places in the text, we refer to Zorn’s Lemma, which is a tool for han- 
dling uncountable sets. We will now describe it. A partial ordering of a set S is a 
relation s < s’ which may hold between certain elements and which satisfies the fol- 
lowing axioms for all s, s’, s” in S: 


(1.8) pos 


@) 3525; (Qo 
(ii) ifs = s' ands’ = s",thens = 3”; ( { Trensit) 
(iii) ifs = s' ands’ <s,thens = s' » (Aadigym.) 


A partial ordering is called a total ordering if in addition 
(iy)itor allos.s’ in: Sis = 5 ors’ = 5. ( Trichotowy) 


For example, let S be a set whose elements are sets. If A, B are in S, we may 
define A = BifA C B. This 1s a partial ordering on S, called the ordering by inclu- 
sion. Whether or not it is a total ordering depends on the particular case. 

If A is a subset of a partially ordered set S, then an upper bound for A is an 
element b € S such that for all a € A, a S b. A partially ordered set S is called 
inductive if every totally ordered subset T of S has an upper bound in S. 

A maximal element m € S is any element such that S contains no larger one, 
that is, such that there is no element s € S with m S s, except for m itself. This 
doesn’t mean that m is an upper bound for S; in particular, there may be many dif- 
ferent maximal elements. For example, the set of all proper subsets of {1,.... 2} con- 
tains n maximal elements, one of which is {1,3,4,..., n}. 


\a-9) Lemma. Zorn’s spi on inductive partially ordered set has a maximal 
element. o 


ae 


Zorn’s Lemma is equivalent with the axiom of choice, which is known to be inde- 
“pendent of the basic axioms of set theory. We will not enter into a further discussion 
of this equivalence, but we will show how Zorn’s Lemma can be used to show that 
every vector space has a basis. Let us use unordered sets of vectors here. 


A1.10) Proposition. Every vector space V over a field has a basis. 
geese 


Proof. We take for S the set of (unordered) linearly independent subsets of V, 
partially ordered by inclusion, as above. We check that S is inductive: Let T be a to- 
tally ordered subset of S. Then we claim that the union of the sets making up T is 
also linearly independent; hence it is in S. To verify this, let 

B=UA 
AET 
be the union. By definition, a relation of linear dependence on B is finite, so it can 
be written in the form 


(1.11) Ci01 TF *** ees 
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with v; © B. Since B is a union of the sets A € T, each v; is contained in one of 
these subsets, call it A;. Let i, j be two of the indices. Since T is totally ordered, 
A; C Aj or else Aj C A;. It follows by induction that one of the sets, say A;, contains 
all the others. Call this set A. Then v; € A for all i = 1,...,n. Since A is linearly 
independent, (1.11) is the trivial relation. This shows that B is linearly independent, 
hence that it is an element of S. 

We have verified the hypothesis of Zorn’s Lemma. So S contains a maximal el- 
ement B, and we claim that B is a basis. By definition of S, B is linearly indepen- 
dent. Let W = Span (B). If W < V, then we choose an element » € V which is not 
in W. Then the set B U {v} is linearly independent [see Chapter 3 (3.10)]. This con- 
tradicts the maximality of B and shows that W = V, hence that B is a basis. o 


A similar argument proves Theorem (8.3) of Chapter 10. 


(1.12) Proposition. Let R be a ring. Every ideal J # R is contained in a maximal 
ideal. 


We leave this proof as an exercise. o 


2. TECHNIQUES OF PROOF 


Exactly what mathematicians consider an appropriate way to present a proof is not 
clearly defined. It isn’t customary to give proofs which are complete in the sense 
that every step consists in applying a rule of logic to the previous step. Writing such 
a proof would take too long, and the main points wouldn’t be emphasized. On the 
other hand, all difficult steps of the proof are supposed to be included. Someone 
reading the proof should be able to fill in as many details as needed to understand it. 
How to write a proof is a skill which can be learned only by experience. Cy 
We~ will discuss three important techniques used to construct proofs:di- 
chotomy,Anduction, and ‘contradiction. 
The word dichotomy means division into two parts. It is used to subdivide a 
<) problem into smaller, more easily managed pieces. Other names for this procedure 
are case analysis and divide and conquer. Here is an example of dichotomy: One 
definition of the binomial coefficient (7) (read n choose k) is that (2) is the number of 
subsets of order & in the set {1,2,...,m}. For example, (3) = 6: The six subsets of or- 
der 2 of {1, 2,3, 4} are {1, 2}, {1, 3}, {1,4}, {2, 3}, {2,4}, {3, 4}. 


(2.1) Proposition. For every integer n and every k =n, (7) = ("%') + (21). 


Proof. Let S be a subset of {1,2,...,n} of order k. Then either n € S or 
n € S. This is our dichotomy. Ifn € S, then S is actually a subset of {1, 2,...,” — TH. 
By definition, there are ("x') of these subsets. Suppose that n € S, and let 5 = 
S — {n} be the set obtained by deleting the element n from the set S. Then S’ is a 
subset of {1,2,...,2 — 1}, of order n — 1. There are (=) such sets § - Henee there 
are (7-1) subsets of order k which contain n. This gives us (";') + GET) subsets of 


order k altogether. o 
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The remarkable power of the method of dichotomy is shown here: In each of 
the two cases, n © § and n € S. we have an additional fact about our set S. This 
additional fact can be used in the proof. 

Often a proof will require sorting through several possibilities, examining each 
in turn. This is dichotomy. or case analysis. For instance, to determine the species of 
a plant. Grav’s Manual of Botany leads through a sequence of dichotomies. A typical 
one is “leaves opposite on the stem (go to h), or leaves alternate (go to k).” 
Classification of mathematical structures will also proceed through a sequence of di- 
chotomies. They need not be spelled out formally in simple cases, but when one is 
dealing with a complicated range of possibilities, careful sorting is needed. Here is a 
simple example: 


(2.2) Proposition. Every group of order 4 is abelian. 


Proof. Let G be a group of order 4, and let x, y be two elements of G. We are 
to show that xv = yx. Consider the five elements 1, x, v, xv, yx. Since there are only 
four elements in the group, two of these must be equal. If xv = yx, the proposition 
is verified. We now run through the other possibilities: 


Case 1: x = lory = 1. Ifx = 1, thenxy = y = yx. Ify = 1, thenxy = x = yx. 

Case 2: xy = | or yx = 1. Then y = x7!" and xy = 1 = yx. 

Case 3: x = y. Then xy = x? = yx. 

Case 4; Either xv = x. vx = x, xv = ¥. or yx = y. In the first two cases, we can- 
cel x to conclude that y = 1, which puts us back in Case 1. In the last two 
cases, we cancel y. 


This exhausts all possibilities and completes the proof. o 


_ Induction is the main method for proving a sequence of statements P,,, indexed 
by positive integers n. To prove P, for all n, the principle of induction requires us to 
do two things: 


(23) 


(i) prove that P; is true, and 
(11) prove that if. for some integer k > 1, Px is true, then P;+, is also true. 


Sometimes it 1s more convenient to prove that if, for some integer k = 0, Pyx-, is 


true, then P; is true. This is just a change of the index. 
Here are some examples of induction: 


(2.4) Proposition. The determinant of an upper triangular matrix is the product of 
its diagonal entries. 


Proof. Here P, is the assertion that the proposition is true for ann X n trian- 
gular matrix. In case of a | x | matrix, there is only one diagonal entry, and it is 
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equal to the determinant. This means that P, is true. We now assume that P;_, is 
true, and we prove P; using that fact. Let A be a triangular k x k matrix. We expand 
the determinant by minors on the first column: 


det A = ai det Ai — @2 det Az + °°: 


Since A is triangular the terms a2), @3:,.... ax are all zero, so det A = aj; det Aj. 
Now notice that Ai, is a (k — 1) X (k — 1) triangular matrix and that its diagonal 
entries are az2,d33,..., ax. Since Px-, is true by hypothesis, det Ai, is the product 
Ax °*: ax. Therefore det A = @)1@22°** axe, aS required. o 


n! 


(2.5) Proposition. (7) = Hiren 


r 


' 
Proof. Let P, be the statement that (;) = kin! for all k = 1,...,r. As- 


sume that P,-; is true. Then the formula is true when we substitute n = r — 1 and 
k = k and is also true when we substitute n = r — 1 andk =k — 1: 


i a Eee ere (r — 1)! 
OO ae T= OM EP EG HE 
According to Proposition (2.1), (4) = ("%') + (1). Thus 
r) — (r-1 ee eee — Lie 
G@) =a") + Cet) = ie= T= =i 
nak r! k r! r! 


par ey! Pe kN k)! 


This shows that P, is true, as required. o 


As another example, let us prove the pigeonhole principle (1.6a), that if a map 


gy: S—>T between finite sets is injective, then |S| <|7|. We use induction on 
n = |T|. The assertion is true if n = 0, that is, if T is empty, because the only set 
which has a map to the empty set is the empty set. 

We suppose that the theorem has been proved for n = k — 1, and we proceed 
to check it for n = k, where k > 0. We suppose that |7| = k, and we choose an el- 


ement ¢t € T. 


Case 1: t is in the image of gy. Since ¢ is injective, there is exactly one element 
s € S such that p(s) = t. Let S’ = S — {s} and T’ = T — {t}. Restricting 
g to S’, we obtain an injective map gy’: S’—>T'. Since |T’| = 
T|— 1=k— 1, our induction hypothesis implies that |S'| < |T'|. 
Therefore |S| =|S'|+1<|7T'|+1=|T|. 

Case 2: tis not in im g. In this case the image of ¢ is contained in T’ = T — {t}. 
So ¢ defines an injective map ST’. Our induction hypothesis again im- 
plies that |S| <= |7'| =|T|-— 1.5 
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There is a variant of the principle of induction, called complete induction. Here 
again, we wish to prove a statement P, for each positive integer n. The principle of 
complete induction asserts that it is enough to prove the following statement: 


(2.6) If n is a positive integer, and if P; is true 
for every positive integer k <n, then P, is true. 


When n = 1, there are no positive integers kK < n. So the hypothesis of (2.6) is au- 
tomatically satisfied for n = 1. Hence a proof of (2.6) must include a proof of P;. 

The principle of complete induction is used when there is a procedure to re- 
duce P,, to Px for some smailer integers k, but not necessarily to P,-;. Here is an ex- 
ample: 


(2.7) Theorem. Every integer n > | is a product of prime integers. 


An informal proof, which also exhibits an algorithm for finding a prime factoriza- 
tion, goes as follows: If n is a prime integer, then it is the product of one prime, and 
we are done. If not, then it has a divisor different from | and n. If n is given to us 
explicitly, we will be able to check whether or not there is such a proper divisor. [f 
so, then n can be written as a product of integers, say n = ab, neither of which is |, 
and then a and b are less than n. We continue factoring a and 0 if possible. Since the 
size of the factors decreases each time, this procedure can not be continued 
indefinitely, and eventually we end up with a prime factorization of 7. 

The principle of complete induction formalizes the statement that one can’t 
continue replacing a positive integer by a smaller one infinitely often. To apply the 
principle, we let P, be the statement that n is a product of primes, and we assume 
that P; is true for all k <n. We go through the argument again. Either n is prime, in 
which case we are done, or else n = ab and a and b are less than n. In this case the 
induction hypothesis tells us that Pa and Pp, are both true, that is, that a and 6b are 
products of primes. Putting these products side by side gives us the required factor- 
ization of n. 

The two proofs look slightly different from each other, because the algorithm 
is not mentioned in the statement of the theorem and has been partially suppressed 
in the formal proof. A better statement of the theorem would exhibit the algorithm: 


(2.8) Theorem. The procedure of factoring an integer > 1 terminates after 
finitely many steps. 


In this formulation, the formal proof becomes identical with the informal one. o 


_Proofs by contradiction proceed by assuming that the desired conclusion is 
false and deriving a contradiction from this assumption. The conclusion must there- 
fore be true. We can, for example, rewrite the proof given above that a group of or- 
der 4 is abelian, in this way: 
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Proof of (2.2), Rewritten. We suppose that G is a nonabelian group of order 4, 
and we proceed to derive a contradiction from this assumption. Since G is not abe- 
lian, there are elements x, y © G such that xy # yx. Then y can not be any one of 
the elements 1,.x,x ', because those elements commute with x. Similarly, x is not 
equal io 1, vy, or y '. We may now check that the elements 1, x, y, xy, yx are distinct. 
This contradicts the hypothesis that |G| = 4. Therefore there does not exist a non- 
abelian group of order 4. 5 


Netice that there is no real difference between the two proofs of (2.2). The proof 
just given is really a fake contradiction argument, and, though logically correct, it is 
not aesthetically pleasing. One should avoid writing proofs in this way. On the other 
hand, there are true proofs by contradiction, in which the proof is not easily warned 
around to eliminate the contradiction. The proof given in the text [Chapter 6 (1.13}] 
that a group of order p°, p a prime, is abelian is an example, as is the proof of (3.11) 
given below. 


3. TOPOLOGY 


This section reviews some concepts from topology which we will need from time to 
time. The sets which we want to study; are subsets of Euclidean space R*. 

Let r be a positive real number. The open ball of radius r about 2 point 
X € R* is the set of all points whose distance to X ts less than r: 


(3.1) By, = {x’ © RF||x' — x| < rh. 


A subset U of R* is called oper: if whenever a point X lies in U the points suffciently 
near to X also lie in U. In other words, U is open if it satisfies the following condi- 
tion: 


(3.2) IfX € U and if r is sufficiently small, then Bx, C U. 


The radius r will depend on the point X. 
Open sets have the following properties: 


Gx) 


(i) The union of an arbitrary family of open sets is open. 
(ii) The intersection of finitely many open sets is open. 


The whale space BR aw! the emriy set © are the simplest examples of open 
seis. Some more interesiing open sets are obtained in this way: Let fbe a continucus 
function R*— R. Then the sets 


(3.4) {f > 0}, {f < 0}, tf # O} 
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are cpen. For instance, if f(x) > 0, then f(x’) > 0 for all Xx’ near X, because f is 
continuous. This shows that the general linear group GL,(R) is an open subset of the 
space R‘ of all 2 x 2 matrices, because it is the set {det P # 0}. Also, the open ball 
By,, is an open set in R*, because it is defined by the inequality |x’ — x| — r < 0. 

Let S be any set in R*. We will also need the concept of open subset of S. By 
definition, a subset V of S is called open in S if whenever it contains a point X, then 
it also contains all points of S which are sufficiently near to X. This condition is ex- 
plained by the following lemma: 


(3.5) Lemma. Let V be a subset of a set S in R*. The following conditions on V 
are equivalent. If either one of them holds, then V is called an open subset of S: 


(i) V = U NS for some open set U of R*; 
(ii) For every point X € V, there is anr > 0 so that V contains the set By, M S. 


Proof. Assume that V = U M S for some open set U of R*. Let x € V. Then 
X € U, and (3.2) guarantees the existence of an r > 0 such that By, C U. So 
Bx, OAS CU 1S = V, and (ii) is verified. Conversely, suppose that (11) holds. 
For each X € V, choose an open ball Bx,, such that Bx, M S C V, with the radius r 
depending as usual on the point X. Let U be the union of these balls. Then U is an 
open set in R* (3.3i), and U MS C V. On the other hand, x € By, NS CUNS 
for every X € V. Therefore V C U MS, and V = U O S as required. o 


Open subsets of S have the properties (3.3), which follow from the same prop- 
erties of open subsets of R* because of (3.5i). 

It is customary to speak of an open set V of S which contains a given point p as 
a neighborhood of p in S. 

A subset C of a set S is called closed if its complement (S — C) is open. For 
example, let fj: R‘—> R (i = 1,..., k) be continuous functions. The locus 


(3.6) {f= f=... = fe= 0} 


of solutions to the system of k equations f; = 0 is a closed set in R*, because its com- 
plement is the union of the open sets {f; # 0}. The 2-sphere {x7 + x3 + x3 = 1} is 
an example of a closed set in R®. So is the rotation group SO). It is the locus in R?*? 
defined by the five equations 


= 2 = ; = 
XX — XX = 1, Xu? + XX = 1, Xt + Xy* = 1, 
XuX12 + XvxXy =O, XaxXn + XxX = O. 


Closed sets have properties dual to (3.3): 
G7) 


(i) The intersection of an arbitrary family of closed sets is closed. 
(ti) The union of finitely many closed sets is closed. 


' These rules follow from (3.3) by complementation. 
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A subset C of R* is called bounded if the coordinates of the point in C are 
bounded, meaning that there is a positive real number b, a bound, such that for 
= (een ,) e oF 


(3.8) |xi| = b, 


for all i = 1,...,. If C is both closed and bounded, it is called a compact subset of 
R*. The Spo! 2-sphere is a compact set in R?. 

Let S, T be subsets of R” and R”. A map f: S— T is called continuous if it car- 
ries nearby points of S to nearby points of T. Formally, the property of continuity is 
stated this way: 


(3.9) Let s © S. For every real number €-> 0, there is a 6 > 0 such that if 
s' € Sand\s' — s| <8, then| f(s') — f(s)| <e. 


The easiest way to get a continuous map from S to T is as a restriction of a continu- 
ous map F: R”— R" which happens to carry S to T. Most of the maps we use are of 
this form. For example, the determinant is a continuous function from any one of 
the classical groups to R or C. 

A map f: SS’ is called a LEAN Preise if it is bijective and if f~', as well 
as f, is continuous. 

For example, the unit circle S' in R? i is homeomorphic to the rotation group 
SO,. The homeomorphism f: S'— SO, is given by restricting the map 


F(x1,%2) = | a 


X2 Xt 


which carries R? to the space R* of 2 X 2 matrices. The map F is not bijective and 
is therefore not a homeomorphism, but it restricts to a homeomorphism f on the sub- 
sets S' and SO2. Its inverse is ‘the restriction to SO, of the projection G: R* — R? 
which sends a 2 X 2 matrix to its top row. (The word homeomorphism must not be 
confused with homomorphism!) 

A path is a continuous map f: [0, 1]—> R* from the unit interval to the space 
IR‘, and the path is said to lie in S if f(t) € S for every ¢ & [0,1]. A subset S of R* 
is called path-connected if every pair of points p, gq © S can be joined by a path ly- 
ing in S. In other words, for every pair of points », g € S, there is a path f such that 


(3.10) 
(i) f(t) € S for all ¢ in the interval; 
(ii) f(0) = p and f(1) = q. 
Here is the most important property of path-connected sets: 
(3.11) Proposition. A path-connected set S is not the disjoint union of proper 
open subsets. In other words, suppose that — 
S=UVj, 
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where V; are open sets in S and Vi N V; = @ if i # j. Then all but one of the sets V; 
is empty. 


Proof. Suppose that two of the sets are nonempty, say Vo and V,. We set aside 
Vo and replace V, by the union of the remaining subsets, which is open by (3.3). 
Then Vo U V, = S and %) M V, = @. This reduces to the case that there are exactly 
two open sets. 

Choose points p © Vo and gq € Vi, and let f: [0, 1]—S be a path in S con- 
necting p to g. We will obtain a contradiction by examining the path at the point 
where it leaves Vo for the last time. 

Let b be the least upper bound of all t € [0,1] such that f(t) © Vo, and let 
X = f(b). If X € Vo, then all points of Bx, MS are in Vo, if r is small enough. 
Since f is continuous, f(t) € Bx,, for all ¢ sufficiently near b. So f(t) © Vo for these 
points. Taking ¢ slightly larger than b contradicts the choice of b as an upper bound 
of the points mapping to Vo. Therefore X is not in Vo, so it has to be in V;. But rea- 
soning in the same way, we find that f(t) © V, for all ¢ sufficiently near b. Taking ¢ 
slightly smaller than b contradicts the choice of b as the least upper bound of points 
mapping to Vo. This contradiction completes the proof. o 

The final concept from topology is that of manifold. 


(3.12) Definition. A subset S of R” is called a manifold of dimension d if every 
point p of S has a neighborhood in S which is homeomorphic to an open set in R?. 


For example, the sphere {(x, y,z)|x? + y? + z? = 1} is a two-dimensional mani- 
fold. The half sphere U = {z > 0} is open in S* (3.4, 3.5) and projects continuously 
to the unit ball Bo. = {x7 + x3 + x3 < 1} in R*®. The inverse function z = 
1 — x? — y? is continuous. Therefore U is homeomorphic to Bo. Since the 3- 
sphere is covered by such half spheres, it is a manifold. 
The figure below shows a set which is not a manifold. It becomes a manifold 
of dimension 1 when the point p is deleted. Note that homogeneity is false for this 
set. It looks different near p from how it looks near the other points. 


(3.13) Figure. A set which is not a manifold. 
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4. THE IMPLICIT FUNCTION THEOREM 


The Implicit Function Theorem is used at two places in this book, so we state it here 
for reference. 


(4.1) Theorem. Implicit Function Theorem: Let f(x,y) = (filx,y),..., frlx,y)) be 
functions of n + r real variables (x, v) = (x1,...,Xm,Yi;---, Yr), Which have continu- 
ous partial derivatives in an open set of R"'" containing the point (a,b). Assume 
that the Jacobian determinant 


Oy: Oyr 
det] - 
a | ah 
oy, Oyr 


is not zero at the point (a,b). There is a neighborhood U of the point a in R” such 
that there are unique continuously differentiable functions Y,(x),..., ¥-(x) on U satis- 


fying 
f(x, ¥(x)) =O and Y¥(a) = b. 


The Implicit Function Theorem is closely related to the Inverse Function Theo- 
rem, which is used in Chapter 8 (5.8): 


(4.2) Theorem. Inverse Function Theorem: Let f be a continuously differentiable 
map from an open set U of R” to R”. Assume that the Jacobian determinant 


oft of 
ae ky 
det | : 
afr ep 
ax mer OX 


is not zero at a point a € R”. There is a neighborhood of a on which f has a contin- 
uously differentiable inverse function. 


We refer to the book by Rudin listed in the Suggestions for Further Reading for 
proofs of these two theorems. 5 


We also use the following complex analogue of the Implicit Function Theorem 
in one place [Chapter 13 (8.14)]: 


(4.3) Theorem. Let f(x,y) be a complex polynomial. Suppose that for some 
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(a,b) € C?, we have f(a, b) = 0 and = (a,b) # 0. There is a neighborhood U of x 
in C on which a unique continuous function Y(x) exists having the properties 
f(x, Y(x)) =0 and Y(a)=b. 


Since references for this extension are not so common, we will give a proof which 
reduces it to the real Implicit Function Theorem. The method is simply to write ev- 
erything in terms of its real and imaginary parts and then to verify the hypotheses of 
(4.1). The same argument will apply with more variables. 


Proof. We write x = x0 + mi, y = yo+ yii, f = fot fii, where fj = 
filxo, %1, Yo, y1) is a real-valued function of four real variables. We are to solve the 
pair of equations fo = fi = 0 for yo, y: as functions of xo, x,. According to (4.1), we 
have to prove that the Jacobian determinant 


af af 
OYo Oy 
ST afi afi 
Yo Oy; 


is not zero at (a, b). Since f is a polynomial in x, y, the real functions fj are also poly- 
nomials in x;,y;. So they have continuous derivatives. 


(4.4) Lemma. Let f(x,y) be a polynomial with complex coefficients. With the 
above notation, 


. Of — dfo ofi . 
— = — + —i, and 
Q) dy dyo dYo 


ES 0 
(1i) the Cauchy—Riemann equations Sfo = Shy and Sfo =- oft hold. 
OYo OY; Oy: OYo 


Proof of the Lemma. Since f is a polynomial and since the derivative of a sum 
is the sum of the derivatives, it is enough to prove the lemma for the monomials 
cy” = (co + c1i)(yo + yii)". For these monomials, the lemma follows from the 
product rule for differentiation, by induction on 7. o 

We return to the proof of Theorem (4.3). By hypothesis, fi(ao, a1, bo, b1) = 0. 

: 0 
Also, since a (a,b) # 0, we know by (4.41) that oe = do and = = d, are not 
0 0 
both zero. By (4.4ii), the Jacobian determinant is 


dy | 
det = de + d; > 0. 
ai|4 e 0 1 


This shows that the hypotheses of the Implicit Function Theorem (4.1) are satisfied. o 
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EXERCISES 


I. Set Theory 


i 


Let g: Z—R be the map defined by y(n) = n> — 3n + 1. 
(a) Is @ injective? 
(b) Determine g~'(U), where U is the interval (i) [0, ©), (ii) [2,4], (iii) [4, 12]. 


- Give an example of a map ¢: S—'S from an infinite set to itself which is surjective but 


not injective, and one which is injective but not surjective. 

Let g: S—>T be a map of sets. 

(a) Let U be a subset of S. Prove that p(y~'(U)) C U and that if ¢@ is surjective, then 
e(e'(U)) = U. 

(b) Let V be a subset of T. Prove that g~'(e(V)) D V and that if ¢ is injective, then 
ge '(e(V)) = V. 


. Let g: ST be a map of sets. A map ~: T—S is called a left inverse if po gp: SS is 


the identity and a right inverse if p ° i: T—T is the identity. Prove that ¢ has a left in- 
verse if and only if it is injective and has a right inverse if and only if it is surjective. 


- Let S be a partially ordered set. 


(a) Prove that if S contains an upper bound b for S, then b is unique, and also b is a max- 
imal element. 
(b) Prove that if S is totally ordered, then a maximal element m is an upper bound for S. 


. (a) Describe precisely which real numbers have more than one decimal expansion and 


how many expansions such a number has. 
(b) Fix the proof of Proposition (1.7). 


. Use Zorn’s Lemma to prove that every ideal / # R is contained in a maximal ideal. Do 


this by showing that the set S of all ideals 7 # R, ordered by inclusion, is inductive. 


2. Techniques of Proof 


1. 


w@ we 


Use induction to find a closed form for each of the following expressions. 
(a) 14+345 + +: + (Qn + 1) 

(b) 174+ 224+ 33 +--+ n? 

(cyl +2 1/3 + ae 


1 
(d) 
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. Prove that 12 + 23 + - +n? = (n(n + 1))?/4. 
. Prove that 1/(1 - 2) + 1/(2: 3) + «+ I/(n(n + 1) = n/(n + 1). 
. Let S, T be finite sets. 


(a) Let g: S——>T be an injective map. Prove by induction that |S| = |7| and that if 
|S| = |7|, then ¢ is bijective. 

(b) Let g: S——T be a surjective map. Prove by induction that |S| = |7| and that if 
|S| = |7|, then ¢ is bijective. 


. Let n be a positive integer. Show that if 2” — 1 is a prime number, then n is prime. 
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6. Let an = 2?” + 1. Prove that dn = aodi*** @n-1 + 2. 

7. A polynomial with rational coefficients is called irreducible if it is not constant and if it is 
not a product of two nonconstant polynomials whose coefficients are rational numbers. 
Use complete induction to prove that every polynomial with rational coefficients can be 
written as a product of irreducible polynomials. 


8. Prove parts (b) and (c) of Theorem (1.6). 


3. Topology 


1. Let S be a subset of R‘, and let f, g be continuous functions from S to R. Determine 

whether or not the following subsets are open or closed in S. 

(a) {f(x) = 0} (b) {f(X) # 2} (© {F(X) <9, g(x) > O} 

(d) {f(x) = 0, g(x) <0} (e) {f(X) #0, g(x) = 0} (f {F(X) € Z} 
(g) (f(x) € Q} 

2. Let X € R”. Determine whether or not the following sets are open or closed. 
(a) {X|r € R,r > 0} (b) {rx|r E R,r =O} 

3. (a) Let P = (pj) be an invertible matrix, and let d = det P. We can define a map 
GL,(R) > R*! by sending Pa(p,,,d). Show that this rule embeds GL,(R) as a 
closed set in R"*!, 

(b) Illustrate this map in the case of GL,(R). 

4. Prove that the product of M xX M’ two manifolds M, M’' is a manifold. 

- Show that SZ2(IR) is not a compact group. 

6. (a) Sketch the curve C: x3 = x} — x7? in R?. 

(b) Prove that this locus is a manifold of dimension | if the origin is deleted. 


wm 


4. The Implicit Function Theorem 


1. Prove Lemma (4.4). 
2. Prove that SL2(R) is a manifold, and determine its dimension. 
3. Let f(x, y) be a complex polynomial. Assume that the equations 


have no common solution in C?. Prove that the locus f = 0 is a manifold of dimension 2. 


Notation 


NOTATION 


mo > 


3 ~ 
tn} 


FRONO, 


~~ SS 
Uv 
3 


— 


the alternating group, Chapter 2 (4.7) 

the open ball of radius r about the point X, Appendix (3.1) 

the field of complex numbers, Chapter 2 (1.11) 

the cyclic group of order n, Chapter 5 (3.4) 

the dihedral group, Chapter 5 (3.4) 

determinant, Chapter 1 (3.4) 

the prime field Z/( p), Chapter 3 (2.4) 

the general linear group, Chapter 2 (1.13) 

the identity matrix, Chapter 1 (1.14) 

the icosahedral group, Chapter 5 (9.1) 

the image of the map y, Appendix (1.3) 

the kernel of the homomorphism ¢, Chapter 2 (4.5) 

the space of bounded sequences, Chapter 3 (5.2) 

the group of motions of the plane, Chapter 4 (5.15), Chapter 5 (2.1) 
the normalizer of H, Chapter 6 (3.7) 

the set of positive integers, or natural numbers, Chapter 10 (2.1) 
the orthogonal group, Chapter 5 (5.3), Chapter 8 (1.3) 

the Lorentz group, Chapter 8 (1.4) 

the projective group, Chapter 8 (8.2) 

the field of real numbers, Chapter 2 (1.11) 

the space of n-dimensional vectors, Chapter 3 (1.1) 

the symmetric group, Chapter 2 (1.14) 

the n-dimensional sphere, Chapter 8 (2.6) 

the special linear group, Chapter 2 (4.6), Chapter 8 (1.8) 

the special orthogonal group, Chapter 4 (5.4), Chapter 8 (1.8) 
the symplectic group, Chapter 8 (1.6) 

the special unitary group, Chapter 8 (1.8) 

the tetrahedral group, Chapter 5 (9.1) 

(superscript f) the transpose of a matrix, Chapter 1 (2.24) 

trace, Chapter 4 (4.18) 

the unitary group, Chapter 7 (4.15), Chapter 8 (1.8) 

the center of a group, Chapter 2 (4.10) 

the ring of integers, Chapter 2 (1.11) 

the centralizer of x, Chapter 6 (1.5) _ 

If A is a complex matrix, then A* = A’, Chapter 7 (4.7) 

In a matrix display, « denotes an undetermined entry, Chapter 1 (1.15) 
The starred exercises are some of the more difficult ones. 
(superscript +) The group whose law of composition is addition, 
Chapter 2 (1.1) 

(superscript <) The group whose law of composition is multiplication, 
Chapter 2 (1.1) 

direct sum, Chapter 3 (6.4), Chapter 12 (6.3) 
factorial; n! is the product of the integers 1, 2,..., n. 
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(2) a binomial coefficient, Appendix (2.1) 
[u] the largest integer = yx, Chapter 11 (10.23) 


If S and 7 are sets, we use the following notation: 


the number of elemerits, also called the order of the set S. 

s is an element of S. 

S is a subset of T, or S is contained in T. In other words, every element of 
S is also an element of 7. 

T contains S, which is the same as § C T. 

S is a proper subset of T, meaning that it is a subset, and that T contains 
an element which is not a member of S. 

This is the same as § < T. 

This notation is used only when S is a subset of 7, and then it denotes the 
complement of S in T, the set of all elements which are in T but not in S: 


T —S = {x|x € T butx E S}. 


The intersection of the sets S and 7, which is the set of all elements in 
common to § and 7. 

The union of the sets S and 7, which is the set of all elements x which are 
contained in at least one of the sets § and T. 

the product set. Its elements are ordered pairs (s, t) of elements: 


SX T= {(s,)|s ES, t € Th. 


Since the parentheses have other meanings, we sometimes leave them off, 
and denote an element of the product set by s,t. 
a map ¢ from § to T, or a function whose domain is S and whose range is 


- The wiggly arrow indicates that the map under consideration sends the el- 


ement s to the element f, i.e., that p(s) = t. 
This symbol indicates that a digression in the text, such as a proof or an 
example, has ended, and that the text returns to the main thread. o 
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Abel, 570 Algorithm, Todd—Coxeter, 223 
Abelian character, 325 Almost everywhere, 516 
Abelian group, 451 Alternating group, 52 
Abelian groups, Structure Theorem, 472 Angle: 
Addition: between vectors, 126, 248 
in a field, 83 trisection of , 505 
matrix, 2 Annihilator, 484 
in a module, 450 Antipodal point, 277 
in a ring, 346 Arithmetic: 
vector, 78, 86 Fundamental Theorem of, 390 
Adjoint matrix, 29, 250 modular, 64 
Adjoint representation, 304 Arrow, 586 
Adjunction: wiggly, 586 
of an element, 365 Ascending chain condition, 393, 467 
symbolic, 506 Associate elements, 392 
Affine group, 306 Associative law, 5, 39 
Algebra: Automorphism, 176 
Fundamental Theorem of, 527 of a field, 539 
Lie, 291 of a group, 50 
Algebraically closed field, 527 Averaging over a group, 311 
Algebraically dependent, 525 Axiomatic characterization of determinant, 23 
Algebraically independent, 525 Axioms, of choice, 101, 374, 588 
Algebraic closure, 527 Axioms, Peano, 348 
Algebraic curve, 376 
irreducible, 386 Baker, 416 
Algebraic element, 493 Ball, open, 593 
Algebraic extension, 499 Basis: 
Algebraic geometry, 373 change of, 98 
Algebraic group, 289, 299 of a module, 454 
Algebraic integer, 410 orthogonal, 244 
Algebraic number, 345 orthonormal, 126, 241 


Algebraic variety, 373 standard, 26, 90, 454 


Basis: (continued ) 
symplectic, 261 
theorem, 469 
transcendence, 525 
of a vector space, 90 

Bezout bound, 376 

Biection, 586 

Bijective map, 586 

Bilateral symmetry, 155 

Bilinear form, 238 

Binomial coefficient, 589 

Biquadratic extension, 539 

Block, Jordan, 480 

Block multiplication, 8 

Bound, upper, 588 

Bounded set, 595 

Bracket, Lie, 290, 291 

Branched covering, 378, 520 
isomorphism of, 519 

Branch points, 521 

Bruhat decomposition, 236 

Bundle, vector, 483 

Burnside’s Formula, 196 


Cancellation Law, 42, 84, 369 
for ideals, 422 
Canonical form, rational, 479 
Cantor, 587 
Cardano’s Formula, 544 
Cardinality of a set, 586 
Case analysis, 589 
Cauchy—Riemann equations, 598 
Cayley—-Hamulton Theorem, 153, 488 
Cayley’s Theorem, 197 
Cayley transform, 306 
Center: 
of gravity, 163 
of a group, 52 
Centralizer, 198 
Cenirally symmetric set, 426 
Chain condition, ascending, 393, 467 
Change of basis, 98 
matrix of , 98 
Character, 316 
abelian, 325 
dimension of, 317 
irreducible, 316 
Character group, 325 
Characteristic: 
of a field, 86 
of a rig, 358 
Characteristic polynomial, 122 
Characteristic subgroup, 234 
Characteristic value, 117 
Characteristic vector, 117 
Character table, 320 
Chinese remainder theorem, 303, 441 
Choice, axiom of, 101, 374 


Circulant, 268 
Class: 
congruence, 56, 64 
conjugacy, 198 
equivalence, 54 
ideal, 417, 425 
isomorphism, 49 
residue, 64 
Class Equation, 198 
Class function, 318 
Class group, 426 
Class number, 417, 426 
Classical group, 270 
Classification of groups, 49, 299 
Closed set, 594 
Closed word, 233 
Closure, algebraic, 527 
Coefficient, leading, 350 
Column index, | 
Column vector, 2 
Combination, linear, 87 
Commutative law, 39 
Commutative ring, 346 
Commutator, 222 
Commutator subgroup, 234 
Compact group, 313 
Compact set, 595 
Complement, orthogonal, 243 


Complete expansion of the determinant, 28 


Complete induction, 380, 592 
Complete set of relations, 464 
Complex algebraic group, 299 
Complex representation, 310 
Component, connected, 77 
Composition, law of, 39 
Conductor, 387 
Congruence: 

class, 56, 64 

of integers, 64 
Congruent matrices, 270 
Conic, 255 
Conjugacy class, 198 
Conjugate element, 51 
Conjugate linearity, 250 
Conjugate representation, 309 
Conjugate subfield, 559 
Conjugate subgroup, 180 
Conjugation, 50, 198 
Connected component, 77 
Connected set, 595 
Connected, simply, 278 
Constructible point, line, circle, 500 
Constructible real number, 502 
Construction, ruler and compass, 500 
Content, 399 
Continuous function, map, 595 
Continuous representation, 313 
Contradiction, proof by, 592 


Index 


Index 


Convex set, 427 
Coordinates, 94 
Coordinate vector, 94, 455 


Correspondence theorem, 75, 360, 452 


Coset, 57 
double, 77 
left, 57 
right, 59 
Coset multiplication, 68 
Coset space, 178 
Counting Formula, 58, 180 
Covering: 
branched, 378, 520 
Cramer’s Rule, 31 
Crystallographic group, 172, 187 
Crystallographic restriction, 169 
Crystal system, 187 
Cubic, resolvent, 564 
Cubic equation, 543 
Cubic extension, 497 
Curve, algebraic, 376 
Cut and paste, 520 
Cycle: 
decomposition, 213 
notation, 213 
Cyclic group, 46, 164, 184 
Cyclic permutation, 25 
Cyclotomic field or extension, 567 
Cyclotomic polynomial, 405 


Decomposition, polar, 304 
Defining relations for a group, 221 
Definition, 585 
inductive or recursive, 348 
Degree: 
of an algebraic curve, 387 
of an element, 497 
of a field extension, 497 
of a polynomial, 350 
of a rational function, 535 
transcendence, 526 
weighted, 550 
Dependence, linear, 88, 101 
Determinant, 20, 453 
axiomatic characterization, 23 
complete expansion of. 28 
of an operator, 123 
Vandermonde, 36 
Diagonal entries of a matrix, 6 
Diagonalization, 130, 458 
Diagonal matrix, 6 
Dichotomy, 589 
Differential equation, 135 
Dihedral group, 164, 184 
Dimension: 
of a character, 317 
of a linear group, 293 
of a manifold, 596 


of a representation, 308 

of a vector space, 93 
Dimension formula, 110 
Diophantine equation, 410, 437 
Direct sum: 

of representations, 315 

of submodules, 471 

of subspaces, 102 
Discrete group of motions, 166, 167 
Discriminant, 548 

of a cubic, 546 

of a quadratic number field, 413 
Distance between vectors, 125 
Distinct elaments, 585 
Distributive law, 5 
Divide and conquer, 589 
Diviser, 392 

greatest common, 46, 395 

proper, 392 

zero, 368 
Domain: 

Euclidean, 397 

fundamental, 195 

integral, 368 

of a map, 585 

principal ideal, 396 

unique factorization, 394 
Dot product, 125, 237 
Double coset, 77 
Double covering, 277 


Echelon matrix, 14 

Eigenvalue, 117 

Eigenvector, 117 

Eisenstein Criterion, 404 

Element: 
algebraic, 493 
associate, 392 
conjugate, St 
of a field extension, primitive, 552 
ideal, 356 
idempotent, 382 
identity, 4] 
image of, 585 
infinitesimal, 365 
invertible, 42 
irreducible, 392 
of a lattice, primitive, 17 
maximal, 388 
nilpotent, 365 
norm of, 414 
order of , 47 
prime, 395 
representative, 55 
transcendental, 493 
unipotent, 381 
unit, 347 
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Elementary column operation, 18 
Elementary matrix, 11 
Elementary row operation, 12 
Elementary symmetric function, 547 
Elements: 

distinct, 585 

independent, 454 
Elimination, Gaussian, 12 
Ellipsoid, 258 
Entries: 

diagonal, 6 

of a matrix, | 
Equation: 

class, 198 

Diophantine, 437 

homogeneous, 16 

linear, 4 

quartic, 560 

quintic, 570 
Equations, Cauchy—Riemann, 598 
Equivalence class, 53 
Equivalence relation, 53 

determined by a map, 55 
Eratosthenes, sieve of, 403 
Euclidean domain, 397 
Euclidean space, 247, 
Euler, 410 
Evaluation of polynomials, 353 
Even permutation, 26 
Exceptional group, 299 
Existence of factorizations, 393 
Existence theorem, Riemann, 519 
Expansion by minors, 20 
Exponential of a matrix, 138 
Expressible by radicals, 571 
Extension: 

algebraic, 500 

biquadratic, 539 

cubic, 497 

cyclotomic, 567 

Caluis, 546 

Kummer, 566 

pure transcendental, 525 

quadratic, 497 

ring, 364 

transcendental, 525 
Extension field, 492 
External law of composition, 81 


Factorization: 
existence of, 392 
irreducible. 5 
prime, 395 
Faithful module, 491 
Faithful operation, 183 
Faithful representation, 308 
Faltings, 437 
Fermat Equation, 409 


Fermat’s last theorem, 437 
Fermat’s Theorem, 105 
Fibonacci numbers, 154 
Fibration, Hopf, 280 
Fibre of a map. 55 
Field, 83 
algebraically closed, 527 
automorphism of, 539 
characteristic of, 86 
cyclotomic, 567 
finite, 492, 509 
fixed, 540 
function, 493, 516 
intermediat@ 542 
number, 492 
order of, 509 
prime, 83 
splitting, 540 
Field extension, 492 
degree of, 497 
finite, 497 
generators of, 495 
Field extensions, isomorphism of, 496 
Field of fractions, 369 
Finite-dimensional vector space, 91 
Finite extension, 497 
Finite field, 492, 509 
Finite linear combination, 100 
Finitely generated module, 454 
Finite set, 586 
Finite simple group, 299 
First Isomorphism Theorem, 68, 360, 452 
Fixed field, 540 
Fixed point, 162 
Fixed Point Theorem, 162, 199 
Form: 
bilinear, 238 
Hermitian, 250 
indefinite, 243 
invariant, 311 
Jordan. 480 
Killing, 304 
Lorentz, 243 
matrix of, 239 
nondegenerate, 244 
null space of, 244 
positive definite, 241, 252 
quadratic, 256 
restriction of, 248 
signature of, 245 
skew-symmetric, 238, 260 
syminetric, 238 
Formal linear combination, 94 
Four group, 48 
Fraction, 369 
Fraction field, 369 
Fractions, partial, 441 
Free abelian group, 223 
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Free group, 219 
mapping property of, 220 
Free module, 454 
Free semigroup, 217 
Frobenius norm, 153 
Frobenius reciprocity, 343 
Function, 586 
class, 318 
continuous, 594 
inverse, 586 
multi-valued, 519 
partially symmetric, 561 
rational, 369, 516 
single-valued, 519 
size, 397 
successor, 348 
symmetric, 547 
Function field, 493, 516 
Fundamental domain, 195 
Fundamental Theorem: 
of Algebra, 527 
of Arithmetic, 390 


Galois, 570 
Galois extension, 540 
Galois group, 539, 558 
Galois theory, main theorem of, 542 
Gaussian elimination, 12 
Gauss integers, 345 
Gauss prime, 406 
Gauss’s Lemma, 400 
General linear group, 43, 453 
Generators: 
of a field extension, 495 
of a group, 220 
of a module, 454 
of a subgroup, 48 
Genus, 534 
G-invariant form, 311 
G-invariant subspace, 314 
G-invariant transformation, 325 
Glide reflection, 157 
Glide symmetry, 156 
Gram-Schmidt procedure, 241 
Gravity, center of, 163 
Greatest common divisor, 46, 395 
Group, 42 
abelian, 42 
affine, 306 
algebraic, 289, 299 
alternating, 52 
automorphism of, 50 
center of, 52 
character, 325 
class, 426 
classical, 270 
compact, 313 
complex algebraic, 299 


crystallographic, 172, 187 
cyclic, 46, 164, 184 
dihedral, 164, 184 
discrete, 166, 167 
exceptional, 299 
free, 219 
free abelian, 222 
Galois, 539, 558 
general linear, 43, 453 
generators of, 220 
icosahedral, 184 
ideal class, 429 
infinite cyclic, 46 
lattice, 172 
of Lie type, 300 
linear, 270 
Lorentz, 271 
Matthieu, 300 
of motions, 127 
octahedral, 184 
order of, 47 
orthogonal, 124, 271 
point, 168 
product, 61 
projective, 296 
quaternion, 48 
quotient, 67 
real algebraic, 289 
relations in, 220 
rotation, 125 
simple, 201, 299 
special linear, 271 
special orthogonal, 124, 271 
special unitary, 271 
spin, 278 
sporadic, 300 
symmetric, 43 
of symmetries, 156 
symplectic, 271 
tetrahedral, 184 
translation, 167 
translation in, 292 
triangle, 235 
unitary, 252, 271 
Group homomorphism, 51 
kernel of, 51 
Group operation, 176, 309 
Group representation, 308 
Groups: 
abelian, Structure Theorem, 472 
classification of , 49 
homomorphism of, 51 
isomorphism of, 49 


Haar measure, 314 
Half integer, 413 
Half lattice point, 417 
Hermitian form, 250 
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Hermitian matrix, 251 
Hermitian operator, 253 
Hermitian product, 250 
Hermitian symmetry, 250 
Hilbert Basis Theorem, 469 
Hilbert Nullstellensatz, 371 
Homeomorphism, 595 
Homogeneity, 292 
Homogeneous equation, 16 
Homomorphism: 

of groups, 51 

image of, 51 

of modules, 451 

of rings, 353 
Hopf fibration, 276, 280 
Hyperboloid, 258 
Hypervector, 96 


Icosahedral group, 184 
Ideal, 356 

generated by a set, 357 

maximal, 370 

norm of, 425 

prime, 420 

principal, 357 

product, 419 

proper, 357 

unit, 357 

zero, 357 
Ideal class, 417, 425 
Ideal class group, 429 
Ideal element, 356 
Ideals, cancellation law for, 422 
Idempotent element, 382 
Identities, permanence of, 456 
Identity, 456 
Identity element, 41 
Identity matrix, 6 
Image: 

of an element, 586 

of a homomorphism, 51 

inverse, 586 

of a map, 586 
Imaginary part, 137 
Inclusion, ordering by, 588 
Inclusion map, 51 
Indefinite form, 243 
Independent elements, 454 
Independent, linearly, 88, 101 
Independent submodules, 472 
independent subspaces, 102 
Index: 

column, 1 

multi, 352 

row, | 

of a subgroup, 57 
Indices, 25 
Induced law of composition, 44 


Induced representation, 3 3 
Induction, 590 

complete, 380, 592 
Induction axiom, 343 
Inductive definition, 3 18 
Inequality: 

Schwarz, 248 

triangle, 248 
Infinite cyclic groap, 46 
Infinite dimensional space, 100 
Infinitesimal element, 287, 365 
Infinitesimal tangent, 288 
Initial conditions, 137 
Injection, 586 
Injective function, map, 586 
Integer: 

algebraic, 410 

half, 413 

square-free, 4] 1 
Integers: 

congruence of, 64 

Gauss, 345 

ring of, 348, 413 
Integral domain, 368 
Intermediate field, 542 
Interpolation, Lagrange, 444 
Intersection: 

multiplicity of, 387 

of subgroups, 60 

of subsets, 602 
Invariant form, 311 
invariant subspace, 116, 314 
Inverse, 42 

left, 7 

right, 7 
Inverse function, 586 
Inverse image, 55, 586 
Inverse matrix, 7 
Invertible element, 42 
Invertible matrix, 6 
Irreducible algebraic curve, 387 
Irreducible character, 316 
Irreducible element, 392 
Irreducible factorization, 395 
Irreducible polynomial, 390 


Irreducible polynomial for an element, 494 


Irreducible representation, 315 
Isometry, 156 
Isomorphic field extensions, 496 
Isomorphism: 

of branched coverings, 519 

class, 49 

of field extensions, 496 

of groups, 49 

of modules, 451 

of representations, 316 

of rings, 353 

of vector spaces, 87 


Index 


index 


Jacobi identity, 291 
Jordan block, 480 
Jordan form, 480 


Kaleidoscope, 166 
Kernel: 


of a group homomorphism, 52 
of a linear transformation, 110 
of a module homomorphism, 451 


of a ring homomorphism, 356 
Killing form, 304 
Klein four group, 48 
Kronecker, 403, 570 
Kummer extension, 566 


Lagrange, 560 
Lagrange interpolation, 444 
Lagrange’s Theorem, 58 
Latitude, 274 
Lattice, 168 
Lattice group, 172 
Lattice point, half, 417 
Lattices, similar, 397, 425 
Laurent polynomials, 367 
Law of composition, 39 
external, 80 
induced, 44 
Leading coefficient, 350 
Left coset, 57 
Left inverse, 7 
Left multiplication, 9, 176 
Left operation, 176 
Left translation, 292 
Length of a vector, 125, 247 
Lie algebra, 291 
Lie bracket, 290 
Lie type, group of, 299 
Line, 401 
tangent, 387 
Linear combination, 10, 87 
finite, 100 
formal, 94 
Linear equation, 8 
Linear group, 270 
dimension of, 293 
Linearity, conjugate, 250 
Linearly dependent, 88, 101 
Linearly independent, 88, 101 
Linear operator, 270 
Linear relation, 88 
Linear transformation, 109 
kernel of, 110 
matrix of, 112 
restriction of, 116 
Localization of a ring, 385 
Longitude, 274 
Lorentz form, 243 
Lorentz group, 271 


Lorentz transformation, 271 
Liiroth’s Theorem, 555 


Main Lemma, 422 
Main theorem of Galois theory, 542 
Manifold, 596 
Map: 
bijective, 586 
continuous, 595 
domain of, 585 
fibre of , 55 
image of, 585 
inclusion, 51 
injective, 586 
range of, 585 
surjective, 586 
zero, 353 
Mapping property: 
of the free group, 220 
of products, 62 
of quotient groups, 221 
of quotient modules, 452 
of quotient rings, 360 
Maschke’s Theorem, 316 
Matrices: 
congruent, 270 
similar, 116 
Matrix, | 
adjoint, 29, 251 
of change of basis, 98 
diagonal, 6 
elementary, 11 
exponential of, 138 
of a form, 239 
Hermitian, 251 
identity, 6 
inverse, 7 
invertible, 6 
of a linear transformation, 112 
nilpotent, 32 
normal, 259 
orthogonal, 124 
permutation, 25 
positive, 119 
positive definite, 241, 252 
presentation, 465 
row echelon, 14 
scalar, 27 
skew-symmetric, 260 
symmetric, 238 
trace of, 98 
transpose, 18 
triangular, 6 
unitary, 252 
upper triangular, 6 
zero, 6 
Matrix addition, 2 
Matrix entries, 1 
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Matrix multiplication, 3 Noetherian ring, 468 
Matrix representation, 308 Noncommutative ring, 345 
Matrix unit, 11 Nondegenerate form, 244 
Matthieu group, 300 Nonsingular operator, 121 
Maximal element, 588 Nonsingular point, 387 
Maximal ideal, 370 Norm: 
Measure, Haar, 313 of an element, 414 
Minimal polynomial, 489 Frobenius, 153 
Minkowski’s Lemma, 427 of an ideal, 425 
Minors, 153, 484-5, 491 Normalizer, 204 
Minors, expansion by, 20 Normal matrix or operator, 259 
Modular arithmetic, 64 Nullity, 110 
Module, 450 Null space of a form, 244 
basis of , 454 Nulistellensatz, 371 
faithful, 491 Null vector, 244 
finitely generated, 454 Number: 
free, 454 algebraic, 345 
generators of, 454 class, 417, 426 
presentation of, 465 Fibonacci, 154 
rank of, 455 transcendental, 345 
relations in, 464 Number field, 450 
simple, 484 quadratic, 411 
Modules: Numbers, natural, 348 


direct sum of, 471 
homomorphism of, 451 
isomorphism of, 451 


product of, 474 Octahedral group, 184 

Structure Theorem for, 475. Odd permutation, 26 
Monic polynomial, 350 One-parameter subgroup, 283 
Monomial, 350 Open ball, 593 
Monster, 300 Open set, 594 
Motion: Operation: 

orientation-preserving, reversing, 128, 157 elementary, 18 

rigid, 127, 156 faithful, 183 
Motions, group of, 127 of a group, 176, 309 
Multi-index, 352 left, 176 
Multiple root, 377, 508 partial, 227 
Multiplication: ; restriction of, 180 

coset, 68 transitive, 177 

left, 9, 176 Operator, 115 

matrix, 3 determinant of, 123 

right, 18 Hermitian, 253 

scalar, 2, 78, 86 linear, 270 
Multiplication table, 40 nilpotent, 146 
Multiplicative set, 384 nonsingular, 121 
Multiplicity of intersection, 387 normal, 259 
Multi-valued function, 518 orthogonal, 126, 255 

row, 12 


shift, 120, 477 
; singular, 121 
Nakayama Lemma, 491 symmetric, 255 


Natural numbers, 348 trace of, 123 
Negative definite, 264 unipotent, 153 
Neighborhood, 594 unitary, 253 
Nilpotent element, 365 Orbit, 177 

Nilpotent matrix, 32 Order: 

Nilpotent operator, 146 of an element, 47 


Nilradical, 381 of a finite field, 509 


Index 


of a group, 47 

by inclusion, 588 

partial, 588 

of a set, 587 

total, 588 
Ordered set, 87, 588 
Orientation-preserving or reversing motion, 128, 157 
Orthogonal basis, 244 
Orthogonal complement, 243 
Orthogonai group, 124, 270 
Orthogonality relations, 318 
Orthogonal matrix, 124 
Orthogonal operator, 126, 255 
Orthogonal projection, 249 
Orthogonal! representation of SU2, 276 
Orthogonal vectors, 126, 241, 252 
Orthonormal! basis, 126, 241, 252 


P-group, 199 
Paraboloid, 258 
Partial fractions, 441 
Partially symmetric function, 561 
Partial operation, 227 
Partial ordering, 588 
Partition, 53 
Path, 77 
Path-connected, 77 
Peano’s axioms, 348 
Permanence of identities, 456 
Permutation, 25, 43, 211, 586 
cyclic, 25 
even, 26 
odd, 26 
sign of, 26 
Permutation matrix, 25 
Permutation representation, 182, 322 
Pick’s Theorem, 490 
Pidgeonhole principle, 587 
Pivot, 14 
Plane, translation in, 157 
Point, fixed, 162 
nonsingular, 387 
singular, 387, 405 
Point group, 168 
Polar decomposition, 304 
Pole, 373 
‘Polynomial, 350 
characteristic, 121 
cyclotomic, 405 
degree of, 350 
evaluation of, 353 
irreducible, 390, 494 
Laurent, 367 
minimal, 489 
monic, 350 
primitive, 399 
residue of, 354 
Positive definite, 241, 252 
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Positive matrix, 119 
Presentation matrix, 465 
Presentation of a module, 465 
Prime: 

Gauss, 406 

ramified, 425 

split, 425 
Prime element, 395 
Prime factorization, 395 
Prime field, 83 
Prime ideal, 385, 420 
Primitive element of a field extension, 552 
Primitive element of a lattice, 172 
Primitive polynomial, 399 
Principal ideal, 357 
Principal ideal domain, 396 
Principle, Substitution, 353 
Product: 

mapping property of, 62 

of modulcs, 474 

of subsets of a group, 66 
Product group, 61 
Product ideal, 419 
Product ring, 380 
Product set, 602 
Projection, 61 

orthogonal, 249 
Projective group, 296 
Projective space, 277 
Proper divisor, 392 
Proper ideal, 357 
Proper subgroup, 45 
Proper subspace, 87 
Pure transcendental extension, 525 
Pythagoras’ Theorem, 125, 503 


Quadratic extension, 497 
Quadratic form, 256 
Quadratic number field, 411 
discriminant of, 413 
Quadratic reciprocity, 440 
Quadric, 256 
Quartic equation, 560 
Quaternion group, 48 
Quaternions, 306 
Quillen, 482 
Quintic equation, 570 
Quotient group, 67 
mapping property of, 221 
Quotient module, 452 
mapping property of, 452 
Quotient ring, 359 
mapping property of, 360 


Radicals, 571 
Ramified prime, 425 
Range of a map, 585 
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Rank, HI! 
of a free module, 455 
Rational canonical form, 479 
Rational! function, 370. 516 
degree of , 535 
Rav. 280 
Real algebraic group, 289 
Real algebraic set, 286 
Real number, constructible, 502 
Real part, 517 
Real subfield, 568 
Reciprocity: 
Frobenius, 343 
quadratic, 440 
Recursive definition, 348 
Reduced word, 217 
Reducible representation, 315 
Reduction, row, 12 
Reflection, 157 
glide, 157 
Reflexive relation, 53 
Regular representation, 323 
Relation: 
equivalence, 53 
linear, 88 
reflexive, 53 
symmetric, 53 
transitive, 53 
Relations: 
complete set, 464 
in a group, 220 
in a module, 464 
orthogonality, 318 
in a ring, 361 
Relation vector, 464 
Representation, 308 
adjoint, 304 
complex, 310 
conjugate, 330 
continuous, 313 
dimension of, 308 
faithful, 308 
of a group, 308 
induced, 343 
irreducible, 315 
matrix, 308 
permutation, 182, 322 
reducible, 315 
regular, 322 
sign, 320 
of SU2, orthogonal, 276 
unitary, 311 
Representations: 
direct sum of, 315 
isomorphism of, 316 
Representative element, 55 
Residue class, 64 
Residue of a polynomial, 354 


Resolvent cubic, 564 
Restriction: 
crystallographic, 169 
of a form, 248 
of a linear transformation. 116 
of an operation, 181 
to a subgroup. 60 
Riemann existence theorem, 519 
Riemann surface, 376, 518 
Right coset, 59 
Right inverse, 7 
Right multiplication, 18 
Rigid motion, 127, 156 
Ring, 346 
characteristic of, 358 
of integers, 348, 413 
localization of , 385 
noetherian, 468 
noncommutative, 346 
quotient, 359 
relations in, 361 
zero, 347 
Ring homomorphism, 353 
kernel of, 356 
Rings: 
extension of , 364 
homomorphism of, 353 
isomorphism of , 353 
product of , 380 
Root: 
multiple, 508 
of unity, 512 
Rotation, 124, 157 
Rotational symmetry, 156 
Rotation group, 125 
Row echelon matrix, 14 
Row index, ! 
Row operator, 12 
Row reduction, 12 
Row vector, 2 


Ruler and compass construction, 500 


Scalar, 2 

Scalar matrix, 52 

Scalar multiplication, 2, 78, 86 
Schur’s Lemma, 326, 331, 484 
Schwarz Inequality, 248 


Second isomorphism Theorem, 236, 484 


Self-adjoint, 251 

Semidefinite, 263 

Semigroup, 77 

free, 217 

Set: 
bounded, 595 
cardinality of , 586 
centrally symmetric, 427 
closed, 594 


index 


compact, 595 

convex, 427 

finite, 586 

multiplicative, 384 

open, 593-94 

ordered, 87 

order of , 587 

real algebraic, 286 
Sheets, 520 
Shift operator, 120, 477 
Sieve, 403 
Signature of a form, 245 
Sign of a permutation, 26 
Sign representation, 320 
Similar lattice, 398, 425 
Similar matrices, 116 
Simple group, 201, 295 

finite, 299 
Simple module, 484 
Simply connected, 278 
Single-valued function, 518 
Singular operator, 121 
Singular point, 387, 405 
Size function, 397 
Skew-symmetric form, 238, 

266 


Skew-symmetric matrix, 260 


pace: 
- Euclidean, 247 
projective, 277 
vector, 86 
Span, 88, 100 
Special linear group, 271 
Special orthogonal group, 124, 271 
Special unitary group, 271 
Spectral Theorem, 253 
Sphere, 273 
Spin, 277 
Spin group, 277 
Split prime, 425 
Splitting field, 540 
Sporadic group, 300 
Square-free integer, 411 
Stabilizer, 177 
Standard basis, 26, 90, 454 
. symplectic, 261 
Standard Hermitian product, 250 
Stark, 416 
Structure Theorem: 
for abelian groups, 472 
for modules, 475 


conjugate, 179 


generators of, 48 

index of, 57 

normal, 52 

one-parameter, 283 

proper, 45 

restriction to, 60 

Sylow, 206 

transitive, 560 
Submodule, 451 
Submodules: 

direct sum of, 471 

independent, 472 
Subring, 345 
Subset, 602 

proper, 602 
Subspace, 79 

G -invariant, 314 

proper, 87 

T-invariant, 116, 314 
Subspaces: 

direct sum of, 102 

independent, 102 

sum of, 102 
Substitution Principle, 353 
Successor function, 348 
Sum of subspaces, 102 
Surface, Riemann, 376, 518 
Surjection, 586 
Surjective map, 586 
Sushn, 482 
Sylow subgroup, 206 
Sylow Theorem, 205 
Sylvester’s Law, 245 
Symbolic adjunction, 506 
Symmetric form, 238 
Symmetric function, 547 

elementary, 547 
Symmetric group, 43 
Symmetric matrix, 238 
Symmetric operator, 255 
Symmetric relation, 53 
Symmetries, group of, 156 
Symmetry, 156, 176 

bilateral, 155 

glide, 156 

Hermitian, 250 

rotational, 156 

translational, 156 
Symplectic basis, 261 
Symplectic group, 271 


Table: 
character, 320 
multiplication, 40 
Tangent, infinitesimal, 288 
Tangent line, 387 
Tangent vector, 236 
Tangent vector field. 295 
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Tartaglia, 543 Upper bound, 588 
Tetrahedral group, 184 Upper triangular matrix, 6 
Third Isomorphism Theorem, 236, 360, 484 
Todd—Coxeter Algorithm, 223 Vandermonde determinant, 36 
Torus, 524 Variety, algebraic, 373 
Total ordering, 588 Vector, 78, 450 
Trace of a matrix or an operator, 123 characteristic, 117 
Transcendence basis, 525 column, 2 
Transcendence degree, 526 coordinate, 94, 455 
Transcendental element, 493 length of, 125, 247 
Transcendental extension, 525 null, 244 
Transcendental number, 346 relation, 464 
Transform, Cayley, 306 row, 2 
Transformation: tangent, 286 
G-invariant, 325 unit, 124 
linear, 109 Vector addition, 78, 86 
Lorentz, 271 Vector bundle, 483 
Transitive operation, 177 Vector field, tangent, 295 
Transitive relation, 53 Vectors: 
Transitive subgroup, 560 angle between, 126, 248 
Translation, 128, 157 distance between, 125 
in a group, 292 orthogonal, 126, 241 
left, 292 Vector space, 86 
in the plane, 157 basis of, 90 
Translational symmetry, 156 dimension of, 93 
Translation group, 167 finite-dimensional, 91 
Transpose matrix, 18 infinite-dimensional, 100 
Transposition, 25, 212 Vector spaces: 
Triangle group, 235 direct sum of, 102 
Triangle Inequality, 248 isomorphism of, 87 


Triangular matrix, 6 
Trisection of an angle, 505 


Trivial solution, 16 Weight, 550 
Weighted degree, 549 

Union of subsets, 602 Wiggly arrow, 586 
Unipotent element, 381 Wilson’s Theorem, 105 
Unipotent operator, 153 Word, 217 
Unique factorization domain, 394 closed, 233 
Unit, 347 reduced, 217 

matrix, 10 Word problem, 223 


Unitary group, 252, 271 
Unitary matrix, 252 


Unitary operator, 253 Zero divisor, 368 
Unitary representation, 311 Zero ideal, 357 
Unit element, 347 Zero map, 353 
Unit ideal, 357 Zero matrix, 6 
Unit vector, 124 Zero ring, 347 


Unity, root of, 512 Zorn’s Lemma, 588 


